This project comprises of constructing a statistical arbitrage technique via long/short trading between a cointegrated future pair from scratch and the accompanying backtesting. The numerical techniques for the estimate of cointegrated relationship and the mean-reverting Ornstein-Uhlenbeck(OU) process are programmed from the fundamental principles, as well as the trading strategy and the backtest.
The project encompasses cointegrated pair selection from a set of futures, Augmented-Dickey-Fuller(ADF) test with lag 1 and adjusted critical values, Error Correction Model(ECM) of the pair in both directions, mean-reverting OU process fitting, trading signal generation and trading strategy, parameter optimization, and backtesting. During the backtesting phase, the effects of the bid-ask spread, transaction costs, biassed equilibrium, and the total number of trades on the success of the strategy are also explored. Inadequacies and potential improvements of the strategy, as well as additional discussion surrounding the approach and the backtest result.
There are two primary sections to this report. In the first section of the study, the mathematics and theory behind numerical approaches, strategy, and backtest metrics are presented in depth. In the second section, a design for a cointegrated pair trading strategy based on the theory discussed in the first section is offered, along with optimization, backtesting, and discussions on the approach. 18 futures' 15.5-year daily historical data (2002-03-04 to 2017-10-05) is used for pair selection and trading strategy building; the subsequent 5-year daily historical data is used to test the success of the trading strategy (2017-10-06 to 2022-10-28). In addition, the daily historical data of the S&P500 stock index are collected to indicate the market return for the relevant period.
A pair is defined as two stocks/futures/interest rate instruments that tend to move in tandem. When a dislocation between the two price paths is observed, the method consists of trading the spread (a long position in one versus a short position in the other).
In the configuration of such a strategy, there are two distinct components: the selection component (what pairs to select) and the implementation of the tarding method (when/what sizes to trade). Even while these pairs can be selected solely on the basis of fundamental analysis, we will also present statistical analysis and employ a combination strategy based on both. [1]
Following an introduction to certain numerical methods and statistical analysis from first principles. The summary table below provides information regarding the serial number and self-coding status of the techniques to be introduced.
| Numerical Techniques | Serial No. | Python Code |
|---|---|---|
| OLS in matrix form | 1.1.2 | Appendix 2 function regress |
| Vector Autoregression (VAR) | 1.1 | Appendix 2 class VAR |
| AIC BIC | 1.1.3 | Appendix 2 method VAR.AIC_BIC |
| Stability Check | 1.1.4 | Appendix 2 method VAR.Stability |
| Engle-Granger Procedure | 1.3 | Appendix 2 class EG |
| Augmented Dickey Fuller Test | 1.3.1 | Appendix 2 method EG.ADF_Test |
| Critical Values for ADF Test | 1.3.1 | Appendix 2 function CriticalValue |
| Error Correction Equation | 1.3.2 | Appendix 2 method EG.error_correction |
| Johansen Procedure | 1.3.5 | Python Package statsmodels.tsa.vector_ar.vecm |
| Kalman Filter | 1.3.6 | Python Package pykalman |
| Ornstein - Uhlenbeck Process | 1.4.2 | Appendix 2 class OU |
Before introducing the cointegration pair testing methodologies, the Vector Autoregression (VAR) model is introduced. VAR is not directly related to the selcection of the cointegration pair or the test for cointegrated property, but it is an excellent method for demonstrating and analyzing the relationships between the returns of the set of assets, as well as the foundation of the numerical method of matrix-form regression used in the subsequent sections.
As a system of endogenous variables whose values only depend on their historical values, VAR(p) is a crucial structural equation model of 'seemingly unrelated regressor' that can be calculated by row.
$$ y_{1,t} = \beta_{1,0} + \beta_{1,1}y_{1,t-1}+\beta_{1,2}y_{2,t-1}+...+\beta_{1,n}y_{n,t-1}+..._{t-2}...+\epsilon_{1,t}$$$$ y_{2,t} = \beta_{2,0} + \beta_{2,1}y_{1,t-1}+\beta_{2,2}y_{2,t-1}+...+\beta_{2,n}y_{n,t-1}+..._{t-2}...+\epsilon_{2,t}$$$$\cdots$$$$ y_{n,t} = \beta_{n,0} + \beta_{n,1}y_{1,t-1}+\beta_{n,2}y_{2,t-1}+...+\beta_{n,n}y_{n,t-1}+..._{t-2}...+\epsilon_{n,t}$$where $\beta$ are coefficients and $y_{n,t-p}$ is the lagged $p$ value of the $n^{th}$ variable.
In matrix form, the same system of equations can be represented. The benefit lies in the more straightforward formulation of analytical conditions and solutions.
$$Y_{t} = C + \beta_{1}Y_{t-1}+...+\beta_{p}Y_{t-p}+\epsilon_{t}$$where $Y_{t}=(y_{1,t},...,y_{n,t})'$ is $n \times 1$ column vector, and $\beta_{p}$ is $n \times n$ matrix of regression coefficients. Each lag $Y_{t-1}...Y_{t-p}$ will have its own matrix of coefficients $\beta_{p}$.
$$\beta_{p} = \begin{bmatrix} \beta^{p}_{1,1}&\cdots&\beta^{p}_{1,n}\\ \vdots&\ddots&\vdots\\ \beta^{p}_{n,1}&\cdots&\beta^{p}_{n,n}\\ \end{bmatrix}$$Instead of OLS line-by-line estimation, it is possible to estimate the VAR system of regression equations using the following matrix form solution, where the explanatory data matrix $Z$ is formed from the descending lagged dependent matrix $Y$ with ones in the top row and lagged $Y$ sequentially in the bottom rows:
$$Y = BZ + \epsilon$$$$\hat{B} = YZ'(ZZ')^{-1}$$$$\hat{\epsilon} = Y-\hat{B}Z$$where $\hat{B}$ is the estimated coefficient matrix of $\beta$, $\hat{\epsilon}$ is the estimated residual matrix/disturbance matrix.[2]
The estimator of the residual covariance matrix with $T\equiv N_{obs}$:
$$\hat{\Sigma} = \frac{1}{T} \sum\limits_{t=1}^T \hat{\epsilon_{t}}\hat{\epsilon_{t}}'$$The standard errors of beta coefficients will be inside the inverse of information matrix on the diagonal:
$$Cov\left[Vec(\hat{B})\right] = \left(ZZ'\right)^{-1} \bigotimes \hat{\Sigma} = I^{-1}$$where $\bigotimes$ is the Kronecker product.
The optimal lag $p$ is defined by the lowest values of the AIC and BIC statistics derived from the penalized likelihood principle.
where $k' = n \times (n \times p + 1)$ is the total number of coefficients in VAR(p), $\left|\hat{\Sigma}\right|$ is the determinant of the residual covariance matrix.
The stability requirement necessitates that the eigenvalues of any relationship matrix $B_{p}$ lie within the unit circle(<1). In case of $p>1$, the coefficient matrix for each lag $B_{p}$ must be examined independently.[3]
A model-free endogenous system is established to regress the future return on their previous (lagged) returns for futures with stationary returns. Although Vector Autoregression fails to predict daily returns (roughly $O(100%)$ to $O(200%)$ deviation), it is a useful tool for estimating the correlation between the returns of several futures. We will attempt to determine the relationship between the future returns of our data, as well as identify the ideal latency and examine the conditions of stability. It should be noted, however, that the correlation is not the cointegration that we employed in our trading approach.
Correlation reveals the collinearity between the returns of various futures. Cointegration describes the behaviour of a moving pair whose spread is stationary and possesses the feature of mean reversion. The VAR model cannot reveal cointegration, only correlation. When testing for assets that "move together," the correlation of the returns is the first step. However, strongly correlated returns provide little insight into future market movements, and the absence of correlation cannot be regarded as independent. In addition, from the perspective of implementing a pair-trading strategy, a high correlation may avoid divergence and decrease arbitrage opportunities.[1] The initial objective of VAR on future returns is to better comprehend the data. In addition, VAR calculated in matrix form provides the fundamentals of the other numerical approaches utilized in the project.
The probability density of the first-order stationary random process is independent of time. This indicates that its first-order statistics(moments) are independent of time shift: the mean, variance, and autocovariance are constant. The series should be subjected to the unit root test to see if it is a stationary process.
Engle and Granger[4] introduced the concept of cointegration:
A pair of asset $X_{t}$ and $Y_{t}$ is said to be cointegrated (of order 1) if:
$X_{t}$ and $Y_{t}$ are integrated of order 1, i.e. they are $I(1)$
There exist $\alpha$, $b$ such that the linear combination $Z_{t} = Y_{t}-\alpha X_{t}-b$ is $I(0)$ (i.e. the spread is stationary)
With the priori knowledge that $X_{t}$ and $Y_{t}$ are $I(1)$, there are two steps in the Engle and Granger procedure:
Cointegration permits the capture of stochastic patterns shared by many processes. We can anticipate mean-reverting behaviour with cointegrated assets, and by trading long $X_{t}$/short $Y_{t}$ when the residual is positive, we should gain positive P&L once the residual returns to its long-term level (and vice versa when $Z_{t}$ is negative). $\alpha$ should be subject to restrictions. For instance, $\alpha \leq 0$ indicates that $X_{t}$ and $Y_{t}$ must be both purchased or sold, which is illogical for a mean-reversing strategy. Therefore, we can dismiss all the pairs where $\alpha \leq 0$. [1]
There are two ways of cointegration estimation:
Engle-Granger Procedure for a pair of time series. Cause/effect(leading variable) can be established and removes uncertainty about non-unique cointegrating weights [1,$\beta_{coint}$].
Johansen Procedure for cointegrating relationships in a multivariate situation. It is based on the theorem for a reduced-rank matrix with rows that are linearly independent.
In this project, the strategy for long/short trading consists of two assets; hence, the Engle-Granger Procedure should be used to estimate cointegration.
Engle-Granger procedure has two steps:
Step 1: Obtain the fitted residual and ADF-test for stationarity
Step 2: Plug the residual from Step 1 into error correction equation
which are explained in detail in 1.3.1 and 1.3.2. Johansen procedure will be explained in 1.3.5.
Regress asset A price $P^{A}_{t}$ on asset B price $P^{B}_{t}$, and test the fitted residual by Augmented Dickey Fuller(ADF) test with lag of 1. The regress can be expressed as:
$$P^{A}_{t} = \mu_{e} + \beta P^{B}_{t} + e_{t}$$where $\mu_{e}$ and $\beta$ are the coefficients of the regression, $e_{t}$ is the residual of the regression estimation. Thus, we can rewrite the formula into:
$$ e_{t} = P^{A}_{t} - \beta P^{B}_{t} - \mu_{e}$$the cointegrating vector $\beta'_{coint} = [1, -\beta]$ is called the loading of the trading strategy.
The ADF has formula as follow:
$$ \Delta y_{t} = \phi y_{t-1} + \sum\limits_{k=1}^{p} \phi_{k}\Delta y_{t-k} + const. + \beta_{t}t + \epsilon_{t}$$where $\Delta y_{t}$ is the difference of the series for testing, $\Delta y_{t-k}$ is the lagged $k$ difference, $y_{t-1}$ is the lagged 1 series for testing. $\phi$, $\phi_{k}$, $\beta_{t}$ and $const.$ are the estimation coefficients of ADF, $\epsilon_{t}$ is the residual of the ADF estimation. The $y_{t}$ in the equation should be the residual of the regression estimation, which is $e_{t}$. The lagged differences $\Delta y_{t-k}$ improves robustness if there is noticeable serial correlation.
Insignificant $\phi = 0$ signifies unit root for the series $y_{t}$; hence, we must examine the statistical values of the parameter $\phi$. In this project, the time-dependent trend term $\beta_{t}t$ is omitted($\beta_{t}=0$) to prevent the transitory dependency and overfitting that leads to the spurious significance of $\phi$.
Thus, the Augmented Dickey Fuller test with lag of 1 used in the project has formula as:
$$\Delta e_t = \phi e_{t-1} + \phi_{1} \Delta e_{t-1} + const. + \epsilon_{t}$$The parameter for testing is $\phi$ and the hypothesis are:
Null Hypothesis $H_{0}$: The residual of the price regression $e_{t}$ is not stationary. / The residual of the price regression $e_{t}$ has a unit root. / $\phi$ is 0.
Alternative Hypothesis $H_{1}$: The residual of the price regression $e_{t}$ is stationary. / The residual of the price regression $e_{t}$ does not have a unit root. / $\phi$ is significantly not 0.
The t-statistics for the parameter $\phi$ must be compared to the critical values. As the ADF does not adhere to the normal or t distribution, the critical values for the ADF test must be determined using the following formula:
$$ \beta_{\infty} + \beta_{1}/T + \beta_{2}/T^2 + \beta_{3}/T^3$$where $T$ is the number of samples; $\beta_{\infty}$, $\beta_{1}$, $\beta_{2}$ and $\beta_{3}$ are parameters that can be found in "Table 2" of the paper Critical Values for Cointegration Tests by James G.MacKinnon. The formula above is also explained in the paper. For the use of this project, the parameters are of the case of "N = 2" (two variables) and "With Constant" in the Table. $\beta_{3}$ has no value in this case.[5]
If the residual is non-stationary, then no long-run relationship exists and regression is spurious. In this case, we should try other pairs as this pair does not demostrate cointergration.
The second step of the Engle-Granger procedure is to confirm the significance of correction term in the equations for $\Delta P^{A}_{t}$, $\Delta P^{B}_{t}$:
$$\Delta P^{A}_{t} = \phi \Delta P^{B}_{t} - (1-\alpha) e_{t-1}$$The error correction term $e_{t-1}$ is the fitted residual from Step 1. Plug the stationary error correction term from Step 1 as shifted into the error correction linear regression equation above and confirm statistical significance of its coefficient $-(1-\alpha)$:
$$\Delta P^{A}_{t} = \phi \Delta P^{B}_{t} - (1-\alpha) \left(P^{A}_{t-1} - \beta P^{B}_{t-1} - \mu_{e}\right)$$The T statistic of $(1-\alpha)$ should be compared to the critical values of T-distribution in order to confirm the significance of it. The hypothesis are set as:
Null Hypothesis $H_{0}$: the coefficient $(1-\alpha)$ is 0.
Alternative Hypothesis $H_{1}$: the coefficient $(1-\alpha)$ is not 0.
Since the length of the training dataset is greater than 3700, the degree of freedom for the related T-distribution can be considered infinite. The two-tailed critical values of the T-distribution with an infinite degree of freedom at 10%, 5%, and 1% significance levels are 1.645, 1.96, and 2.576, respectively.
As $(1-\alpha) \ll 1$, corrections to equilibrium occur in minuscule increments. Thus, $\Delta P^{A}_{t}$ can be expressed as:
$$\Delta P^{A}_{t} = \phi_{shortrun}\left(P^{B}_{t} - P^{B}_{t-1}\right) + \phi_{longrun}\left(P^{A}_{t-1}-\beta P^{B}_{t-1}\right)$$where the second term of the equation $\left(P^{A}_{t-1}-\beta P^{B}_{t-1}\right)$ indicates the long move towards to the mean/equilibrium of the residual $\mu_{e}$. The error correction equation should be applied to both ways of $P^{A}_{t}$ and $P^{B}_{t}$. The leading variable/future price is chosen fron the way with higher significance of the coefficient $(1-\alpha)$.[6]
Error Correction(EC) has Vector Autoregression representation:
$$\Delta P^{A}_{t} = \beta_{1,1}(e_{t-1}-\mu_e) + \beta_{1,2}\Delta P^{B}_{t} + \beta_{1,3}\Delta P^{A}_{t-1}$$$$\Delta P^{B}_{t} = \beta_{2,1}CointFactor + \beta_{2,2}StaticEq + \beta_{2,3}Augment$$where $CointFactor = [1, -\beta][P^{A}_{t}, P^{A}_{t}]'$, the $[1, -\beta]$ is the non-unique cointegrating weights; $Augment$ is the lagged price change of the asset and the $StaticEq$ is the price change of another asset. Written in matrix form, the above equation has form:
$$\boldsymbol{ \Delta P_{t} = \Pi P_{t-1} + \Gamma \Delta P_{t-1} + \mu_{0}}$$The challenge to robustness is 'sudden' shift from $\mu^{Old}_{e}$ to $\mu^{New}_{e}$.[7]
The Autoregression model can be written in the form of Vector Error Correction Model(VECM). Consider the AR($\tau$):
$$\begin{align} e_{t+\tau} &= C + Be_{t} + \epsilon_{t+\tau}\\ e_{t+\tau}-e_{t} + e_{t}-Be_{t} &= C + \epsilon_{t+\tau}\\ \Delta e_{t+\tau} + (1-B)e_{t} &= (1-B)\mu_{e} + \epsilon_{t+\tau}\\ \Delta e_{t+\tau} &= -(1-B)(e_{t}-\mu_{e}) + \epsilon_{t+\tau}\\ &= -(1-\alpha)(P^{A}_{t} - \beta P^{B}_{t} - \mu_{e}) + ... \quad in\; ECM\\ &= \boldsymbol{\alpha(\beta'_{coint} P_{t}-\mu_{e})} + ...\quad in\; VECM\\ \end{align} $$The resemblance of the aforementioned mathematical form is significant because it connects the Ornstein-Uhlenbeck process, VECM, and cointegration and forms the basis of cointegration-based statistical arbitrage.
Vector Autoregression (VAR) is an endogenous system of return equations, as we have taught. It cannot be applied to prices; such a model would be erroneous.
Prices can be correlated with error-correcting equations. Changes in prices $\Delta P_{t}$ may be subject to a correction that causes long-term price convergence.
$$ \boldsymbol{\Delta P_{t} = \Pi P_{t-1} + \sum\limits_{k=1}^{p}\Gamma_{k} \Delta P_{t-k} + \epsilon_{t}}$$We've seen this equation in 1.3.3. The optimal lag $p$ can be found with the AIC, BIC statistics of VAR explained in 1.1.3. $\boldsymbol{\Pi}$ must have a reduced rank, otherwise rhs will not balance lhs: Differences $ \boldsymbol{\Delta P_{t}}$ will not equate to non-stationary, random prices $P_{t}$ on rhs.
By decomposing the coefficients $\boldsymbol{\Pi = \alpha\beta'_{coint}}$:
$$\boldsymbol{\Delta P_{t} = \alpha(\beta'_{coint} P_{t}-\mu_{e})+ \sum\limits_{k=1}^{p}\Gamma_{k} \Delta P_{t-k} + \epsilon_{t} }$$We have also seen the equation above in 1.3.4, which is a VECM equation. $\mu_{e}$ is called a 'restricted constant' or deterministic trend.
The dimension for $\boldsymbol{\Pi= \alpha\beta'_{coint}}$ is $(n\times n) = (n\times r)\times(r\times n)$. $r$ columns of the vectorised $\boldsymbol{\beta'_{coint}}$ are linearly independent, which are cointegration vectors.
Eigenvalues of $\boldsymbol{\Pi}$ are utilised to compute both Trace Statistic and Max Eigenvalues Statistic:
Trace Statistic:
$${LR}_{r^{*}} = -T \sum\limits_{i=r^{*}+1}^{n} ln(1-\lambda_{i})$$
with hypothesis of $H_{0}: r=r^{*}$ and $H_{1}: r>r^{*}$.
Maximum Eigenvalue Statistic:
$${LR}_{r^{*}} = -T ln(1-\lambda_{r^{*} + 1})$$
with hypothesis of $H_{0}: r=r^{*}$ and $H_{1}: r=r^{*}+1$.
where $T$ is the number of samples, $r$ is the number of cointegration vectors and $\lambda_{i}$ is the value of the $i^{th}$ biggest eigenvalue. Generally, trace statistic tests whether the smallest eigenvalues are 0; while maximum eigenvalue statistic tests whether the eigenvalue is 0. In practise, the trace test or maximum eigenvalue test is a sequence of tests applied to a series of $r^{*}$ values.
Johansen procedure can be used for a set of cointegrating relationships in a multivariate setting[7]. The general steps of Johansen procedure are:
Select the optimal lag of VAR model
Regress on the VECM equation and calculate the trace/maximum eigenvalue statistics
Determine the rank $r$ of $\boldsymbol{\Pi}$ according to the statistics
In case of $r=0$:We all know that there is only one situation in which the rank of a matrix is 0, and that is when the matrix is empty. In this instance, the VECM model degenerates to a VAR(p), indicating that there is no long-term link and that the variables are not cointegrated.
In case of $0<r<n$: $\boldsymbol{\Pi= \alpha\beta'_{coint}}$ has dimension of $(n\times n) = (n\times r)\times(r\times n)$, which means there are $r$ cointegration relationships in $n$ variables.
In case of $r=n$: In this case, $P_{t}$ is stationary itself and there is no need for cointegration analysis.
So the expected case for the cointegration relationship to ocurr is $0<r<n$.
The Johansen approach has made significant contributions to cointegration research in multivariate settings, although the Johansen cointegration test is not an ideal technique. When we reach the result $0<r<n$, we declare that we have discovered a large number of cointegration relationships, which are long-term equilibrium relationships. But the question is, which equilibrium connection are we seeking over the long term? If this is the type of relationship we seek, what are the others? In addition, it is important to highlight that the linear combination of cointegration vectors is still composed of cointegration vectors. So, how do we determine the desired cointegration vector? Each of these questions must be clarified.
First, we'll discuss Kalman Filter (KF). KF is a linear quadratic estimating algorithm that Rudolf Kalman introduced in 1960. In the Apollo programme that followed, the navigation system mostly utilised the KF algorithm. Measuring the system in the time domain, KF predicts the state of some unknown variables at the next time step. The measured objects contain noise interference and additional inaccuracies. KF is more accurate than alternative prediction algorithms.
KF can be utilised in any dynamic system with unknown input to forecast the subsequent state of the system in an adaptive manner. Even when there are numerous and complex environmental disturbances, KF is typically able to achieve highly precise findings. It has a positive impact on the system of continual transformation. During prediction, it is not essential to record a large number of prior states of the system; rather, only the error covariance matrix of the previous state's prediction outcomes is required. Therefore, KF may be swiftly calculated and utilised in real-time dynamic systems.
The Kalman filter handles the challenge of tracking the state of a system undergoing dynamic change. The model's fundamental assumptions include:
1) The state equation of the system is linear;
2) The observation equation is linear;
3) The process noise conforms to zero-mean Gaussian distribution;
4) The observation noise conforms to a Gaussian distribution with zero mean; hence, the Gaussian distribution is always applied in a linearly varying space, and the probability density of the state conforms to the Gaussian distribution.
It consists of two equations:
State Equation/Process equation: $x_{t} = A_{t}x_{t-1} + B_{t}u_{t} +\epsilon_{t}$
Observation Equation: $z_{t} = H_{t}x_{t} + \delta_{t}$
where the noise of process $\epsilon_{t}$ follows the Gaussian distrubution with mean of 0; the noise of observation $\delta_{t}$ follows Gaussian distrubution with mean of 0. We could use the following parameters to desbribe the whole problem:
Model of Kalman Filter
Algorithm of Kalman Filter
KalmanFilter($x{t-1}$, $P{t-1}$, $u{t}$, $z_{t}$):
Prediction:
$\bar{x_{t}} = A_{t}x_{t-1} + B_{t}u_{t}$
$\bar{P_{t}} = A_{t}P_{t-1}A^{T}_{t} + R_{t}$
Correction:
$K_{t} = \bar{P_{t}} H^{T}_{t}\left(H_{t}\bar{P_{t}}H^{T}_{t} + Q_{t}\right)^{-1}$
$x_{t} = \bar{x_{t}} + K_{t}\left(z_{t}-H_{t}\bar{x_{t}}\right)$
$P_{t} = \left(I-K_{t}H_{t}\right)\bar{P_{t}}$
The five equations of the algorithm represent:
Properties of Kalman Filter
Cointegrated prices have a mean-reverting residual/spread $e_{t} = \beta'_{coint} P_{t}$, where $\beta'_{coint} = [1, -\beta]'$ is also called the loading. When the residual/spread moves sufficiently above or below $\mu_{e}$, it serves as a signal to enter the trade; when it returns to around $\mu_{e}$, it serves as a signal to quit the trade. Arbitrage is generated by the relatively substantial price disparity between asset A and asset B, which is anticipated to return in the future. The following are the loadings for each condition:
To make trading systematic and under control, we must fit the residual/spread into the Ornstein-Uhlenbeck process to obtain the mean-reverting parameters and calculate the entry and exit criteria.
As the empirical residual $e_{t}$ generates mean-reversion, Vasicek model can be chosen to simulate the process. It has stochastic differential equation(SDE) of:
$$de_{t} = -\theta(e_{t}-\mu_{e})dt + \sigma_{ou}dX_{t}$$where $X_{t}$ is the standard Brownian motion, $\theta$ is the speed of reversion to the equilibrium $\mu_{e}$. $\sigma_{ou}$ is the scatter of BM diffusion, it is a parameter of the SED. The solution of the SDE has form:
$$e_{t+\tau} = (1-e^{-\theta \tau})\mu_{e} + e^{-\theta \tau}e_{t} + \sigma_{t+\tau}$$This equation can be written in form of an Autoregression process of lag=1(AR(1)):
$$e_{t} = C + Be_{t-1} + \epsilon_{t, \tau}$$where $B = e^{-\theta \tau}$, $C = \left(1-e^{-\theta \tau}\right) \mu_{e}$.
So, the methodology is to fit the residual/spread obtained in the Step 1 of the Engle-Granger procedure to the AR(1) model and get the parameters $B$ and $C$. Once we have $B$ and $C$, we can get:
$SSE$ is sum of squared residuals of the regression for spread $e_{t}$ of the AR(1) model. It represents the covariance, where there is $SSE \times \tau = \Sigma_{\tau}$; $\tau$ is the discrete time step: typically, it is set to be $\frac{1}{252}$ for daily data, but in order to get an annualised $\sigma_{eq}$ regardless of the time length of the training data, in this project $\tau$ is set to be the reciprocal of the days of the training data, which is $\frac{1}{len(data)}$.
The most important parameters for the trading strategy are $\sigma_{eq}$ and $\mu_{e}$. They determine the entry thresholds of the strategy as $\mu_{e} \pm Z \sigma_{eq}$. $Z$ is a parameter to be optimised according to the cumulative return or Sharpe ratio. $\sigma_{ou}$ is the parameter of the SDE, it is not for the signal generation in the trading strategy. $\theta$, $\tilde{\tau}$ and $\tilde{\tau}_{days}$ all described the speed of reversion from different aspect. The most intuitive one is the halflife in days $\tilde{\tau}_{days}$. Halflife is a physical notion that, in the mean-reversion process, specifies the average time after which the difference between the value of $e_{t}$ and the long-term mean $\mu_{e}$ halves:
$$ e_{t+\tilde{\tau}} - \mu_{e} =\frac{1}{2}(e_{t} - \mu_{e}) $$Halflife in days corresponds to the halflife stated in trade day. Importantly, according to the preceding definition of halflife, it is not the average period a position is held. The average position holding time is longer than the halflife; however, the calculation of the average position holding time is beyond the scope of this report and the development of the trading strategy, thus its formula will not be explicitly derived here. Nonetheless, there is little question that a longer half-life corresponds to a longer average position holding duration.
The portfilio of asset A and B is $\Pi = A - \beta B$, where $\beta$ is the coefficient of regression of asset price of A on asset price of B.
In this project's trading approach, there are three types of thresholds based on $\mu_{e}$ and $\sigma_{eq}$; trading signals are generated based on the relationship between the residual and these thresholds.
Thresholds of Entries: $\mu_{e} \pm Z\sigma_{eq}$
Thresholds of Exits: $\mu_{e} \pm Z_{e} \sigma_{eq}$
Thresholds of Stop-loss: $\mu_{e} \pm Z_{s} \sigma_{eq}$
$Z$, $Z_{e}$ and $Z_{s}$ are all parameters to be optimised and they should hold the relationship of $Z_{e} < Z < Z_{s}$. The metric score for the optimisation is Sharpe ratio.
The approach also accounts for the bid-ask spread. The assumption for the bid-ask spread is that it is constant over all trading days for both assets. The closed price represents the midpoint between the ask and bid prices of each asset:
$$ P^{ask}_{t} = P_{t} + \frac{1}{2}spread $$$$ P^{bid}_{t} = P_{t} - \frac{1}{2}spread $$If the bid-ask spread is not 0, the trader must pay the ask price of the asset $P^{ask}_{t}$ when longing and the bid price of the asset $P^{bid}_{t}$ when shorting. As the bid-ask spread increases, the strategy's profit drops.
In the strategy, the transaction costs are assumed to be a fraction of the total amount of trading without any fixed costs. The transaction costs for longing($longcost$) and shorting($shortcost$) the asset could be different. With the presence of the bid-ask spread and transaction costs, the trading price of the portfolio $\Pi$ is:
Bid-ask spread and transaction expenses are anticipated to substantially reduce the total profit of this method, given that the traded portfolio is the residual of the asset values.
The trading method of the project permits position flipping, which implies that from a long position, close-out and entry into a short position are permitted on the same day, or vice versa.
In addition to the P&L of each trade throughout the backtest period, the key backtest findings are displayed as a table and 10 charts. Some of the backtesting platform's concepts are derived from [3] and [8].
The backtest table contains information as follow:
Start Date: the start date of the backtest period.
End Date: the end date of the backtest period.
Total number of trades: the total number of trades happened during the backtest period. (One trade consists of one entry into the position and one corresponding exit/closing out.)
Average trades per year: the average trades happened per year during the backtest period.
Average trading days(Calender): the average position holding time in calendar days.
Annual Return: the average annualised return of the strategy. (The return used here takes form of $\frac{V_{end}}{V_{start}}$.)
Cumulative Return: the cumulative return of the strategy.
Average Return per trade: the average return per trade of the strategy.
Annual volatility: the average annualised volatility of the strategy.
We will look at how to relate the profit and loss of the strategy to the market:
$$R^{S}_{t} = \alpha + \beta R^{M}_{t} + \epsilon_{t}$$where $R^{S}_{t}$ is the return of the strategy, $R^{M}_{t}$ is the market return, in this project it is represented by the return of the S&P500 ETF index. $\alpha$ is the excess return after substracting return due to market movements. $\beta$ is the strategy's market exposure, for which we should not pay much as it is easy to gain by buying the S&P500 ETF index.[3]
Annual Alpha: the average annualised $\alpha$(excess return) of the strategy.
Beta: the average $\beta$(exposure to the market risk) of the strategy.
Information Ratio(IR) focuses on risk-adjusted abnormal return, or risk-adjusted $\alpha$[3]:
$$IR=\frac{\alpha}{\sigma(\epsilon)}$$Sharpe Ratio measures return per unit of risk. It is similar to the information ratio in sense of definition:
$$Sharpe\,Ratio = \frac{\mathbb{E}\left(R^{S}_{t}-r_{f}\right)}{\sigma\left(R^{S}_{t} - r_{f}\right)}$$where $r_{f}$ is the risk-free interest rate, which is assumed to be 0 in this project.
Drawdown is the cumulative percentage loss, given the loss in the initial timestep. Let's define the highest past peak performance as High Water Mark(${HWM}_{t}$), then the drawdown(${DD}_{t}$) has form of:
$${DD}_{t} = \frac{{HWM}_{t} - P_{t}}{{HWM}_{t}}$$where $P_{t}$ is the cumulative return of the strategy or the portfolio value $\Pi_{t}$. Max drawdown is the maximum drawdown over the past period ${max}_{t\le T}{DD}_{t}$.[3]
Max Drawdown: the max drawdown of the strategy during the backtest period.
Daily Value at Risk(99%): the average daily Value at Risk at 99% of the strategy during the backtest period.
The strategy must be able to survive without running into a close-out. It makes sense to pre-define Maximum Acceptable Drawdown(MADD) and trace:
$$VaR_{t} \le {MADD}-{DD}_{t}$$If this relation holds for the entire backtest period, then the drawdown is under control.[3]
There are ten graphs used to illustrate the performance of the strategy during the backtest:
The backtesting platform can also provide profit and loss information for each trade. In our backtest, "return" refers to the multiples of value expressed as a percentage:
$${return} = \frac{V_{T}}{V_{0}}\times 100\% $$where $V_{T}$ and $V_{0}$ indicate the final and initial values of the trading account for each trade. The $V_{0}$ of the entire trading period is 1.
In this section, the cointegration pair selection and Engle Granger procedures are applied to daily futures data, and a detailed mathematical analysis is offered.
The corresponding code of 2.1 can be found in Appendix 1 - Data.ipynb
The data used for the construction of the statistical arbitrage strategy consists of daily closed future prices from 2000-01-01 to 2022-10-31. The source of the data is Yahoo Finance. Initially, there are 20 distinct futures, although some futures lack comprehensive data for the goal period. Futures with available data covering less than 80 percent of the overall period are eliminated. The final data set includes the daily close price of 18 futures from 2002-03-04 to 2022-10-28. The following is a list of the codes and the accompanying product names for future:
| Code | Product Name |
|---|---|
| CC=F | Cocoa |
| CL=F | Crude Oil |
| CT=F | Cotton |
| GC=F | Gold |
| GF=F | Feeder Cattle |
| HE=F | Lean Hogs |
| HG=F | Copper |
| KC=F | Coffee |
| KE=F | KC HRW Wheat |
| LBS=F | Lumber |
| LE=F | Live Cattle |
| NG=F | Natural Gas |
| SB=F | Sugar |
| SI=F | Silver |
| ZC=F | Corn |
| ZO=F | Oat |
| ZR=F | Rough Rice |
| ZS=F | Soybean |
For the sake of analysis simplicity, the future is referred to by their product name rather than the code in the dataset. The full dataset is recorded in the futures.csv file. The graph labelled Futures Prices below depicts the price movements of all 18 20-year futures from 2002-03-04 to 2022-10-28.
It is plain to see that the price scale for futures contracts varies significantly. In order to provide a clearer picture, the following figure, titled Normalised Futures Prices, depicts the price paths that have been normalized to their initial price of the period, thus the price trajectories of all futures begin at 1.0.
The routes are intersecting, and it is difficult to discern any evident co-movements from the plot. In April 2020, however, the price of Crude Oil experienced an abnormally huge decline, and the price became negative. This interesting insight is beyond the scope of the current study, so I will not elaborate further on it.
The dataset is divided into a training set and a testing set with respective proportions of 0.75 and 0.25. The training set contains data from 2002-03-04 to 2017-10-05 and saved in .csv as futures_train.csv, which is used for the task of pair selection, cointegration analysis, construction and optimisation of the trading strategy. The testing set contains data from 2017-10-06 to 2022-10-28 and saved in .csv as futures_test.csv, it is only used for testing the performance of the trading strategy.
The daily close price data of the S&P500 ETF index for the specified time period (2002-03-04 to 2022-10-28) are acquired from Yahoo Finance and saved as SP500.csv. In the backtest, it is utilized to represent the market at that time period.
The corresponding code of 2.2 can be found in Appendix 2 - Numerical Methods and Pair Selection.ipynb
The code of the numerical methods in Appendix 2 is also recorded in numerical_methods.py for the use in other jupyter notebooks
In this section, the Vector Autoregression model (VAR) is applied to the daily return of the training dataset in order to better understand the relationship between the futures. Then, in Step 1 of the Engle-Granger procedure - the Augmented Dickey-Fuller test is used to identify the cointegrating future pair.
The VAR model is appled to the daily return of the future data of the training set. The return is defined as the percentage change between the close prices of the sequential trading days. The first thing to check is the AIC, BIC and stability conditions of VAR(p), where p is the lag from 1 to 10. The corresponding results are shown in the table below:
| Lag | AIC | BIC | Stability |
|---|---|---|---|
| 1 | -145.842 | -145.275 | True |
| 2 | -145.779 | -144.676 | True |
| 3 | -145.696 | -144.056 | True |
| 4 | -145.63 | -143.452 | True |
| 5 | -145.594 | -142.879 | True |
| 6 | -145.525 | -142.272 | True |
| 7 | -145.456 | -141.665 | True |
| 8 | -145.385 | -141.056 | True |
| 9 | -145.332 | -140.465 | True |
| 10 | -145.286 | -139.879 | True |
The VAR model is applied to the future data returns of the training set on a daily basis. Return is defined as the percentage change in closing prices between consecutive trading days. Check the AIC, BIC, and stability conditions for VAR(p), where p is the lag between 1 and 10. The associated findings are displayed in the following table:
| p | Eigenvalues | Modulus | Stable |
|---|---|---|---|
| 1 | 0.1224 | 0.1224 | 1 |
| 0.09 | 0.09 | 1 | |
| 0.088 | 0.088 | 1 | |
| 0.088 | 0.088 | 1 | |
| 0.0816 | 0.0816 | 1 | |
| 0.0816 | 0.0816 | 1 | |
| 0.0669 | 0.0669 | 1 | |
| 0.0361 | 0.0361 | 1 | |
| 0.0359 | 0.0359 | 1 | |
| 0.0359 | 0.0359 | 1 | |
| 0.0572 | 0.0572 | 1 | |
| 0.0572 | 0.0572 | 1 | |
| 0.0675 | 0.0675 | 1 | |
| 0.0675 | 0.0675 | 1 | |
| 0.0562 | 0.0562 | 1 | |
| 0.0562 | 0.0562 | 1 | |
| 0.037 | 0.037 | 1 | |
| 0.0129 | 0.0129 | 1 |
All the eigenvalues have modulus less than 1, which satisfy the stability condition.
The T-statistics of the VAR(1) model's coefficients are compared to their respective critical values. There are 18 distinct futures (18 variables) in our dataset, hence the Degree of Freedom (DF) is 17. For 10%, 5%, and 1%, the critical values of the two-tail T distribution with 17 degrees of freedom are 1.7396, 2.1098, and 2.8996, respectively. We confirm the significance of the coefficient using t-statistics at a significance threshold of 1%. The following are the test's hypotheses:
Null Hypothesis $H_{0}$: the coefficient is 0.
Alternative Hypothesis $H_{1}$: the coefficient is not 0.
The pairs with positive coefficient significantly not 0 are listed in the table:
| Variable | Lag(1) Variable |
|---|---|
| Crude Oil | Soybean(-1) |
| Cotton | Cotton(-1) |
| Feeder Cattle | Feeder Cattle(-1) |
| Feeder Cattle | Soybean(-1) |
| Lean Hogs | Soybean(-1) |
| Copper | Soybean(-1) |
| Lumber | Lumber(-1) |
| Live Cattle | Feeder Cattle(-1) |
| Live Cattle | Soybean(-1) |
| Soybean | Cotton(-1) |
| Soybean | Soybean(-1) |
There are a number of autoregression associations, including those between Cotton, Feeder Cattle, Lumber, and Soybean. The Live Cattle-Feeder Cattle pair is the only associated pair that does not include Soybean. These futures-based pairs are candidates for the statistical arbitrage trading strategy's cointegration pair.
Calculated, sorted, and listed is the correlation between each pair of futures, the five pairs with the highest correlation coefficients are as follows:
| Future 1 | Future 2 | Correlation |
|---|---|---|
| Gold | Silver | 0.792950 |
| KC HRW Wheat | Corn | 0.582558 |
| Corn | Soybean | 0.546765 |
| Feeder Cattle | Live Cattle | 0.545642 |
| Copper | Silver | 0.452464 |
The trading strategy's pair could be one of the five most correlated pairs. The Augmented Dickey-Fuller test is applied to these pairs to determine whether there is cointegration between the two futures.
Correlation does not imply cointegration; therefore, the ADF test should be used to confirm cointegration. The hypothesis are:
Null Hypothesis $H_{0}$: the residual of the regression between two futures is not stationary/the residual of the regression between two futures has unit root.
Alternative Hypothesis $H_{1}$: the residual of the regression between two futures is stationary/the residual of the regression between two futures does not have unit root.
The test results are shown below:
| Gold - Silver | Const | φ | φ1 |
|---|---|---|---|
| coef | -0.00154389 | -0.00337267 | 0.0299409 |
| std err | 0.00468862 | 0.00137355 | 0.0162973 |
| t-stats | -0.329285 | -2.45545 | 1.83716 |
| p-value | 0.74194 | 0.0140709 | 0.0661861 |
| KC HRW Wheat - Corn | Const | φ | φ1 |
|---|---|---|---|
| coef | 0.00999056 | -0.00548272 | 0.109632 |
| std err | 0.138169 | 0.00161452 | 0.0162014 |
| t-stats | 0.072307 | -3.39587 | 6.76682 |
| p-value | 0.942358 | 0.000684104 | 1.31648e-11 |
| Corn - Soybean | Const | φ | φ1 |
|---|---|---|---|
| coef | 0.0707712 | -0.0081612 | -0.0223824 |
| std err | 0.282901 | 0.00207715 | 0.0162923 |
| t-stats | 0.250163 | -3.92903 | -1.3738 |
| p-value | 0.802461 | 8.52878e-05 | 0.169503 |
| Feeder Cattle - Live Cattle | Const | φ | φ1 |
|---|---|---|---|
| coef | -0.00242809 | -0.0110054 | 0.010791 |
| std err | 0.0172081 | 0.00243957 | 0.0163036 |
| t-stats | -0.141102 | -4.5112 | 0.66188 |
| p-value | 0.887789 | 6.44613e-06 | 0.508048 |
| Copper - Silver | Const | φ | φ1 |
|---|---|---|---|
| coef | -0.00104694 | -0.00343318 | -0.00102636 |
| std err | 0.00699657 | 0.00135699 | 0.0163039 |
| t-stats | -0.149636 | -2.53 | -0.0629521 |
| p-value | 0.881052 | 0.0114063 | 0.949805 |
Important values are the T-statistics and the accompanying p-value of the $\phi$ coefficient. Among the top 5 most highly correlated pairs, Gold-Silver and Copper-Silver fail the ADF test, indicating that we cannot reject the null hypothesis with considerable assurance.
KC HRW Wheat-Corn pair rejects the null hypothesis at 5% of significance level.
Corn-Soybean pair rejects the null hypothesis at 1% of significance level.
The pair Feeder Cattle-Live Cattle rejects the null hypothesis with the highest level of confidence with T-stats of -4.5112 and p-value of 6.44613e-06. Thus, Feeder Cattle and Live Cattle futures are chosen for further cointegration study.
The Prices of Feeder Cattle and Live Cattle graphs the future price trajectory of Feeder Cattle and Live Cattle. It is evident from the graph that the two pathways travel in tandem, confirming the conclusion derived by the ADF test's statistical procedure.
As an additional exploration of analysis on the training set, the Johansen procedure is applied to: first, confirm the cointegration relationship we discovered between the Feeder Cattle and Live Cattle pair; and second, determine if there is a cointegration relationship between the futures of the same category.
The Johansen procedure is not self-built; rather, the Python package statsmodels.tsa.vector ar.vecm is utilised. The Johansen test functions are coint johansen() and select coint rank(). They both take as input the optimal lag of the VAR model, which in our instance is 1 based on the analysis in 2.2.1.
The statistics table for the trace test, including the critical values, is as follows:
| Trace Test | ||
|---|---|---|
| Order | 0 | 1 |
| Trace Stats | 22.4441 | 2.14927 |
| Trace CV 90% | 13.4294 | 2.7055 |
| Trace CV 95% | 15.4943 | 3.8415 |
| Trace CV 99% | 19.9349 | 6.6349 |
According to the trace test, the rank $r$ should be 1 at a significance level of 1%. Maximum eigenvalue test findings are:
| Max Eigenvalue Test | ||
|---|---|---|
| Order | 0 | 1 |
| Max_eig Stats | 20.2949 | 2.14927 |
| Max_eig CV 90% | 12.2971 | 2.7055 |
| Max_eig CV 95% | 14.2639 | 3.8415 |
| Max_eig CV 99% | 18.52 | 6.6349 |
The conclusion and the trace test are identical. Thus, the Johansen test confirms the cointegration link between Feeder Cattle and Live Cattle futures.
In this section, the Johansen test is used to determine whether grain futures have a cointegration relationship. The dataset contains five futures in the grain category: KC HRW Wheat, Corn, Oat, Rough Rice, and Soybean.
The statistics table of trace test together with the critical values is:
| Trace Test | |||||
|---|---|---|---|---|---|
| Order | 0 | 1 | 2 | 3 | 4 |
| Trace Stats | 120.141 | 79.5334 | 43.9919 | 17.8202 | 4.59003 |
| Trace CV 90% | 65.8202 | 44.4929 | 27.0669 | 13.4294 | 2.7055 |
| Trace CV 95% | 69.8189 | 47.8545 | 29.7961 | 15.4943 | 3.8415 |
| Trace CV 99% | 77.8202 | 54.6815 | 35.4628 | 19.9349 | 6.6349 |
Accoridng to the trace test, at the significant level of 1%, the rank $r$ should be 3. The results of maximum eigenvalue test is:
| Max Eigenvalue Test | |||||
|---|---|---|---|---|---|
| Order | 0 | 1 | 2 | 3 | 4 |
| Max_eig Stats | 40.6071 | 35.5416 | 26.1716 | 13.2302 | 4.59003 |
| Max_eig CV 90% | 31.2379 | 25.1236 | 18.8928 | 12.2971 | 2.7055 |
| Max_eig CV 95% | 33.8777 | 27.5858 | 21.1314 | 14.2639 | 3.8415 |
| Max_eig CV 99% | 39.3693 | 32.7172 | 25.865 | 18.52 | 6.6349 |
The conclusion is the same as the trace test. The relationship of $0<r<n$ is satisfied, there are 3 cointegration relationships among the 5 futures.
In this section, the Johansen test is used to determine whether metal futures have a cointegration relationship. In the dataset, there are three metal futures: Gold, Silver, and Copper. Despite their significant correlation, the trace test and maximum eigenvalue test indicate the absence of a cointegration connection.
In this section, the Johansen test is used to determine if cash crop futures are cointegrated. The dataset contains four futures for the cash crop category: Cocoa, Coffee, Sugar, and Cotton. Trace test and maximum eigenvalue test provide $r=1$ at a significance level of 1%. Thus, one cointegration relationship exists between the cash crop futures.
In this section, the Johansen test is used to determine whether livestock futures exhibit a cointegration relationship. The dataset contains three futures for the category of livestock: Feeder Cattle, Live Cattle, and Lean Hogs. We are already aware of the cointegration link between Feeder Cattle and Live Cattle; therefore, there must be at least one cointegration relationship among livestock futures.
Trace test and maximum eigenvalue test yield $r=2$ at a significance level of 1%. Therefore, there are two cointegration correlations among livestock futures, satisfying the 'at least one' requirement.
In this section, the Johansen test is used to determine whether energy futures are cointegrated. The dataset contains two futures for the energy category: Crude Oil and Natural Gas. The results of the trace test and the maximum eigenvalue test indicate that there is no cointegration between the variables.
The corresponding code of 2.3 can be found in Appendix 3 - Engle-Granger and OU Process.ipynb
In this section, Step 1 and Step 2 of the Engle-Granger technique are applied to both pairs of futures, Feeder Cattle-Live Cattle and Live Cattle-Feeder Cattle. For the development of the strategy, the method with the greatest statistical significance for Step 2 is picked. Once the pair is found, the residual will be fitted to the OU procedure to obtain the strategy's fundamental parameters.
First, we attempt to regress the price of Live Cattle against the price of Feeder Cattle (Live Cattle is the leading variable). The equation for regression is:
$$ P^{Live}_{t} = \mu_{e} + \beta P^{Feeder}_{t} + \epsilon_{t} $$The regression results are shown in the following table:
| $\mu_{e}$ | $\beta$ | |
|---|---|---|
| coef | 25.1685 | 0.624909 |
| std err | 0.409365 | 0.0030639 |
| t-stats | 61.482 | 203.959 |
| p-value | 0 | 0 |
where $\mu_{e} = 25.1685$ and $\beta = 0.624909$. T-statistics and p-values demonstrate that both coefficients are very significant. This conclusion is predictable: the variables of two future prices can be viewed as two stochastic processes; regressing one stochastic process with another stochastic process using a large number of samples always yields a statistically significant result. Thus, we have the predicted price of Live Cattle and the residual between the predicted and actual prices in the following forms:
$$ P^{Live}_{t} = 25.1685 + 0.624909\times P^{Feeder}_{t} + \epsilon_{t}$$$$ \epsilon_{t} = P^{Live}_{t} - 0.624909\times P^{Feeder}_{t} - 25.1685$$The graph above Live Cattle Prices - Predicted Live Cattle Prices illustrates the path of the price of Live Cattle and its predicted price derived from the regression algorithm. In general, the anticipated path closely parallels the actual path, particularly from 2008 to 2011. Nonetheless, there are a number of gaps where the residual is relatively large.
Residual of Live Cattle prices and predicted Live Cattle prices above plots the residual of predicted and real prices of Live Cattle. According to the plot, there is an obvious mean-reverting behaviour. The curve is more volatile at the beginning and ending of the period and less volatile in the middle.
Step 1 of the Engle-Granger technique entails applying the ADF test on the residual. The residue is fitted to the following:
$$\Delta e_t = \phi e_{t-1} + \phi_{1} \Delta e_{t-1} + Const. + \epsilon_{t}$$The test outcomes are:
| Const. | $\phi$ | $\phi_{1}$ | |
|---|---|---|---|
| coef | -0.00242809 | -0.0110054 | 0.010791 |
| std err | 0.0172081 | 0.00243957 | 0.0163036 |
| t-stats | -0.141102 | -4.5112 | 0.66188 |
| p-value | 0.887789 | 6.44613e-06 | 0.508048 |
Based on the test outcome, we have:
$$\Delta e_t = -0.0110054 e_{t-1} + 0.010791 \Delta e_{t-1} - 0.00242809 + \epsilon_{t}$$The $\phi$ T-stats value is -4.5112, which is to the left of the 1% critical value of -3.899. The associated p-value is 6.4461e-06, which is sufficiently small. Therefore, we can reject the null hypothesis with 99% confidence and infer that the residual is a stationary process without a unit root.
The second step of the Engle-Granger method is then applied to the residual. Fitting the error correction equation is:
$$\Delta P^{Live}_{t} = \phi \Delta P^{Feeder}_{t} - (1 - \alpha)\hat{e}_{t-1} + \epsilon_{t}$$where $\Delta P^{Live}_{t}$ and $\Delta P^{Feeder}_{t}$ are the change of price on date $t$ of the corresponding future, $\hat{e}_{t-1}$ is the fitted stationary residual of date $t-1$ that we have obatained above.
It is required to confirm the significance for $(1-\alpha)$ coefficient:
Null Hypothesis $H_{0}$: the coefficient $(1-\alpha)$ is 0.
Alternative Hypothesis $H_{1}$: the coefficient $(1-\alpha)$ is not 0.
The result table as:
| $\phi$ | $(1-\alpha)$ | |
|---|---|---|
| coef | 0.500584 | -0.0105823 |
| std err | 0.0123218 | 0.00240012 |
| t-stats | 40.6258 | -4.40908 |
| p-value | 0 | 1.03812e-05 |
T-stats value of $(1-\alpha)$ is -4.40908, which is on the left side of the critical value of 1% (-2.576). The corresponding p value is 1.03812e-05, which is small enough. So we can reject the null hypothesis with more than 99% of confidence and conclude that the coefficient $(1-\alpha)$ is not 0. The coefficient $\phi$ is significantly not 0 as well.
The correction to equilibrim consists of very small steps, as $|(1-\alpha)| = 0.0105823 \ll 1$ demonstrates.
Now we regress the price of Feeder Cattle depending on the price of Live Cattle (Feeder Cattle is the leading variable). The equation for regression is:
$$ P^{Feeder}_{t} = \mu_{e} + \beta P^{Live}_{t} + \epsilon_{t} $$The regression results are shown in the following table:
| $\mu_{e}$ | $\beta$ | |
|---|---|---|
| coef | -26.2894 | 1.46739 |
| std err | 0.777788 | 0.00719453 |
| t-stats | -33.8003 | 203.959 |
| p-value | 1.9551e-250 | 0 |
where $\mu_{e} = -26.2894$ and $\beta = 1.46739$. The T-stats and p-values show that both coefficients are significant. The result is foreseeable as mentioned in 2.3.1. Thus, we have the predicted price of Feeder Cattle and the residual between the predicted price and actual price in forms of:
$$ P^{Feeder}_{t} = -26.2894 + 1.46739\times P^{Live}_{t} + \epsilon_{t}$$$$ \epsilon_{t} = P^{Feeder}_{t} - 1.46739\times P^{Live}_{t} + 26.2894$$The plot above Feeder Cattle Prices - Predicted Feeder Cattle Prices illustrates the trajectory of Feeder Cattle prices and their predicted prices derived from the regression model. The anticipated path generally tracks the actual path pretty accurately. Nonetheless, there are a number of gaps where the residual is relatively large.
Residual of Feeder Cattle prices and predicted Feeder Cattle prices above plots the residual of the predicted and actual prices of Feeder Cattle. According to the plot, there is an evident mean-reverting behaviour. In the time following 2014, there are multiple peaks, which contribute to the period's subsequent instability.
Step 1 of the Engle-Granger technique entails applying the ADF test on the residual. The residue is fitted to the following:
$$\Delta e_t = \phi e_{t-1} + \phi_{1} \Delta e_{t-1} + Const. + \epsilon_{t}$$The test results is:
| Const. | $\phi$ | $\phi_{1}$ | |
|---|---|---|---|
| coef | 0.00515058 | -0.01031 | 0.0105005 |
| std err | 0.0256242 | 0.00237045 | 0.0163046 |
| t-stats | 0.201004 | -4.34938 | 0.64402 |
| p-value | 0.840695 | 1.36523e-05 | 0.519563 |
According to the test result, we have:
$$\Delta e_t = -0.01031 e_{t-1} + 0.0105005 \Delta e_{t-1} + 0.0105005 + \epsilon_{t}$$T-stats value of $\phi$ is -4.34938, which is on the left side of the critical value of 1% (-3.899). The corresponding p-value is 1.36523e-05, which is sufficiently small. So we can reject the null hypothesis with more than 99% of confidence and conclude that the residual is a stationary process and does not have unit root.
Step 2 of the Engle-Granger procedure is then applied to the residual. The error correction equation to fit is:
$$\Delta P^{Feeder}_{t} = \phi \Delta P^{Live}_{t} - (1 - \alpha)\hat{e}_{t-1} + \epsilon_{t}$$where $\Delta P^{Live}_{t}$ and $\Delta P^{Feeder}_{t}$ are the change of price on date $t$ of the corresponding future, $\hat{e}_{t-1}$ is the fitted stationary residual of date $t-1$ that we have obatained above.
It is necessary to confirm the significance of the coefficient $(1-\alpha)$:
Null Hypothesis $H_{0}$: the coefficient $(1-\alpha)$ is 0.
Alternative Hypothesis $H_{1}$: the coefficient $(1-\alpha)$ is not 0.
The test outcomes as:
| $\phi$ | $(1-\alpha)$ | |
|---|---|---|
| coef | 0.608639 | -0.00559515 |
| std err | 0.0149792 | 0.00172891 |
| t-stats | 40.6324 | -3.23622 |
| p-value | 0 | 0.00121122 |
T-stats value of $(1-\alpha)$ is -3.23622, which is on the left side of the critical value of 1% (-2.576). The corresponding p-value is 0.00121122, which is less than 0.01. So we can reject the null hypothesis with more than 99% of confidence and conclude that the coefficient $(1-\alpha)$ is not 0. However, the statistical test indicates that the coefficient $(1-\alpha)$ of the Live Cattle-Feeder Cattle pair is less significant than that of the Feeder Cattle-Live Cattle pair.
The coefficient $\phi$ is significantly not 0 as well.
The estimated coefficients of the Engle-Granger technique and the accompanying statistics are presented in the table below:
| Explanatory Future | Dependent Future | EG Step | T-stats | p-value | Reject the Null (1%) |
|---|---|---|---|---|---|
| Feeder | Live | ADF | -4.5112 | 6.44613e-06 | Yes |
| EMC -(1-α) | -4.40908 | 1.03812e-05 | Yes | ||
| EMC φ | 40.6258 | 0 | Yes | ||
| Live | Feeder | ADF | -4.34938 | 1.36523e-05 | Yes |
| EMC-(1-α) | -3.23622 | 0.00121122 | Yes | ||
| EMC φ | 40.6324 | 0 | Yes |
According to the results obtained so far, the Feeder Cattle-Live Cattle pair (with Live Cattle being the leading variable) yields significantly more significant findings. Thus, it has been determined that the Feeder Cattle-Live Cattle pair will be utilised for the statistical arbitrage strategy of this project.
The trading strategy's $\beta_{coint}$ is computed based on the entire duration of the trading dataset considered to be a constant parameter. However, it is intriguing to calculate the rolling $\beta_{coint}$ to determine whether it fluctuates significantly throughout the period of the training dataset. The rolling windows are one, two, three, five, and seven years. On the rolling window of the training set, the T-statistics of the ADF test and the coefficient of the error correction term $(1-\alpha)$ of the error correction equation are also calculated. Attention, $\beta_{coint}$ used here refers to the linear regression parameter $\beta$ in 2.3.1, not the loading in 2.3.7. Their relationship is as follows:
$$\beta_{coint} = -Loading(1)$$The graph below displays the rolling $\beta_{coint}$ curves:
It is evident from the plot of Rolling β_coint for 1Y, 2Y, 3Y, 5Y and 7Y that:
The graph titled T-stats of Rolling ADF Test for 1Y, 2Y, 3Y, 5Y and 7Y depicts the t-stats of the rolling coefficients $\phi$ of the ADF test for rolling window sizes of 1 year, 2 years, 3 years, 5 years, and 7 years. The black dashed lines reflect the significance levels of 10%, 5%, and 1% for the critical values, with a degree of freedom of 3766.
The 1Y, 2Y, and 3Y curves are quite volatile and sensitive to the rolling dataset. They typically exceed the black dashed line, indicating that the computed coefficient $\phi$ based on the data of that rolling period is insignificant. Therefore, the cointegration relationship between two assets is not significant for the 1-year, 2-year, or 3-year periods.
The majority of the time, the 5Y curve remains below the critical value of significance at 10%, but seldom below the critical value of significance at 1%. The majority of the time, the 7Y curve remains below the 5% critical value of significance, but occasionally falls below the 1% critical value.
If the rolling $\beta_{coint}$ is applied to the strategy as an improvement, the rolling window size should be at least 5 years (with significance level of 10%), 7 years (with significance level of 5%). However, it is highly recommended to select a longer rolling window to ensure the occurence of the cointegration relationship.
The plot above T-stats of Rolling (1-α) for 1Y, 2Y, 3Y, 5Y and 7Y draws the curves of the t-stats of the rolling coefficients of the error correction term $(1-\alpha)$ of the error correction equation for rolling window sizes of 1 year, 2 years, 3 years, 5 years and 7 years. The black dashed line represents the critical value of T-distribution at significance level of 1% with degree of freedom of $\infty$.
The 1Y and 2Y curves are highly volatile and sensitive to the rolling dataset. They regularly exceed the black dashed line, indicating that the $(1-\alpha)$ coefficient determined from the data of that rolling period is insignificant. The cointegration relationship between two assets is therefore not always significant based on the 1Y or 2Y term.
The 3Y curve remains above the black dashed line until June 2009, after which it nearly always falls below the crucial value.
Nevertheless, the graph reveals that the 5Y curve (purple) is below the black dashed line the majority of the time, and the 7Y curve is always below the dashed line. The rolling window size should be five years (acceptable), seven years (far better), or longer if the rolling $\beta_{coint}$ is added to the approach as an improvement.
The 2-d Kalman Filter is applied to the regression on the actual Live Cattle price and predicted Live Cattle price based on the regression equation obtained in 2.3.1: $P^{pred.Live}_{t} = 25.1685 + 0.624909\times P^{Feeder}_{t}$. The Kalman Filter provides an adaptive estimate of $\beta_{coint}$ which has relation of:
$$\begin{align} P^{Live}_{t} &= {slope}\, P^{pred.Live}_{t} + {intercept}\\ &= {slope} \left(\beta_{coint} P^{Feeder}_{t} + \mu_{e}\right) + {intercept}\\ &= \left({slope}\,\beta_{coint}\right)P^{Feeder}_{t} + \left({slope}\,\mu_{e} + {intercept}\right)\\ & = \beta^{new}_{coint}P^{Feeder}_{t} + \mu^{new}_e \end{align} $$where $slope$ and $intercept$ are the regressing results from Kalman Filter; the parameters $\beta^{new}_{coint}$ and $\mu^{new}_e$ are the adaptive estimated $\beta_{coint}$ and $\mu_{e}$.
Attention, $\beta_{coint}$ and $\mu_{e}$ used here are parameters of the linear regression of 2.3.1, which are different to the loading and equilibrium of 2.3.7.
Different Kalman Filter parameter settings have little effect on the output curves and convergence speed based on our training data.The following is the statistics table after applying the Kalman Filter:
| Statistics | Value |
|---|---|
| $\beta_{coint} $ | 0.6249 |
| Mean of $\beta^{new}_{coint}$ | 0.6182 |
| $\mu_{e}$ | 25.1685 |
| Mean of $\mu^{new}_{e}$ | 25.8593 |
| Mean of Original Residual | 8.0146e-5 |
| Mean of Kalman filtered Residual | -0.9027 |
| Std of Original Residual | 7.0761 |
| Std of Kalman filtered Residual | 2.0402 |
The mean of the adaptive $\beta_{coint}$ based on Kalman Filter have the value a little bit smaller (about 1.07% smaller) than the original constant $\beta_{coint}$, while the mean of the adaptive $\mu_{e}$ based on Kalman Filter have the value a little bit bigger (about 2.75% bigger) than the original constant $\mu{e}$. The mean of the residual between the actual price of Live Cattle and the predicted price with adaptive $\beta_{coint}$ and $\mu_{e}$ as parameters has a significantly larger absolute value than the mean of the residual from the original regression, whereas the standard deviation is significantly smaller (from 7.08 to 2.04).
The following graphs illustrate the relationship between adaptive parameters and constant parameters, as well as residuals generated from adaptive parameters and constant parameters.
The plot Adaptive β_coint based on Kalman Filter shows the adaptive parameter - $\beta^{new}_{coint}$. The black dashed line is the constant $\beta_{coint}$ of 2.3.1, which is 0.624909. The orange dashed line is the mean of $\beta^{new}_{coint}$. $\beta^{new}_{coint}$ varies between 0.84 and 1.15 times $\beta_{coint}$. The re-estimated $\beta_{coint}$ with Kalman Filter is relatively steady with range of $\pm 15\%$.
The graph Adaptive μ_e based on Kalman Filter illustrates the adaptive parameter - $\mu^{new}_{e}$. The black dashed line is the constant $\mu_{e}$ of 2.3.1, which is 25.1685. The orange dashed line is the mean of $\mu^{new}_{e}$. $\mu^{new}_{e}$ varies between 0.88 and 1.19 times $\mu_{e}$. The re-estimated $\mu_{e}$ with Kalman filter is relatively steady with range of spread of 31%.
The figure titled Original Residual vs. Residual after Kalman Filter depicts the original residual of 2.3.1 (orange) and the residual after the application of the Kalman Filter (blue). It is evident that the amplitude of residual after the Kalman Filter is significantly smaller, indicating that the regression is more precise and less volatile, with a standard deviation of 2.04 only. This indicates the superior regression capability of the Kalman Filter regression.
Apply the Step 1 and 2 of the Engle-Granger procedure to the residual obtained from the application of adaptive $\beta_{coint}$ estimated from Kalman Filter, we have:
| Step 1 | Residual after Kalman Filter | Original Residual |
|---|---|---|
| t-stats | -15.2382 | -4.5112 |
| p-value | 5.20740e-28 | 6.44613e-06 |
| Step 2 | (1-α) Residual after Kalman Filter | (1-α) Original Residual |
|---|---|---|
| t-stats | -2.9395 | -4.40908 |
| p-value | 0.0032874 | 1.03812e-05 |
According to the Step 1 (Augmented Dickey-Fuller test with lag of 1), both t-stats and p-value demonstrate that the residual from adaptive $\beta_{coint}$ is significantly more stationary than the original residual. The result of Step 2 (Error Correction) is less relevant than the result in 2.3.1, but is still significant at level of 1%.
It seems that the residual obtained from the adaptive parameters of $\beta^{new}_{coint}$ and $\mu^{new}_{e}$ has better stationarity property. However, we will still use the constant $\beta_{coint}$ and $\mu_{e}$ in the construction of the trading strategy.
In this section, the Feeder Cattle-Live Cattle pair's stationary residual is fitted to the Ornstein-Uhlenbeck mean reversion process. The fitted AR(1) model formula is:
$$e_{t} = C + B e_{t-1} + \epsilon_{t,\tau}$$The outcomes are displayed in the table:
| C | B | |
|---|---|---|
| coef | -0.00237265 | 0.989116 |
| std err | 0.0172047 | 0.00243215 |
| t-stats | -0.137907 | 406.683 |
| p-value | 0.890314 | 0 |
$B$ and $C$ are AR(1) model coefficients used to compute the trading strategy's parameters.
Above Residual vs AR(1)-predicted Residual plots the original residual of the future pair and the residual predicted by AR(1) model. It can be noted that the period's fitting is exceptional. The observable periods when the predicted residual does not fit the real residual are Nov. of 2002, Nov. and Dec. of 2003, Dec. of 2005 and Sep. of 2008.
The graph above Residual of Fitting to OU Process demonstrates the spread of the predicted residual from AR(1) and the actual residual. It is clearer to notice the difference comparing to the plot Residual vs AR(1)-predicted Residual. There are numerous peaks, however many of them are difficult to discern in the preceding plot despite their vast spreads as depicted in this plot. This is because the enormous absolute values of the actual and forecast residuals form numerous big peaks. However, when stated as a percentage of the residual at that moment, these peaks are not significantly large.
Based on the values of $B$ and $C$, we have the parameters of mean-reversion as:
| Parameters | Values |
|---|---|
| $\theta$ | 41.2149 |
| $\mu_{e}$ | -0.217989 |
| $\sigma_{eq}$ | 7.17367 |
| $\sigma_{ou}$ | 65.1303 |
| halflife | 0.0168179 |
| halflife(days) | 63.3362 |
The halflife is 63.3362 working days, which is approximately 90 calendar days. This is not so good for the trading strategy because the halflife is relatively long, and thus the average position holding days are supposed to be long as well. The speed of mean reversion is not particularly fast so we might not expect that many trades when implementing the strategy.
The value of $\mu_{e}$ is -0.217989, which is close to 0. $\sigma_{eq}$ is the most crucial parameter. The subsequent graphic, titled Mean Reverting Residuals with Equilibrim and Bounds, depicts the residual, as well as the equilibrium $\mu_{e}$ in blue line and the bounds $\mu_{e} \pm \sigma_{eq}$ in pink dashed line. The residual occasionally crosses the pink lines, creating arbitrage chances.
The following table provides the most crucial parameters for the creation of a trading strategy:
| Parameters | Values |
|---|---|
| Loading($\beta'_{coint}$) | [1, -0.624909] |
| Halflife(working days) | 63.3362 |
| Trading sigma($\sigma_{eq}$) | 7.17367 |
| Equilibrium($\mu_{e}$) | -0.217989 |
As previously stated, the half-life is 63 working days, which is detrimental to the trading strategy. During the subsequent creation of a trading strategy, greater emphasis should be placed on optimizing the $Z$ value. $\mu_{e} \pm Z\sigma_{eq}$ generates trading signals, we should make a tradeoff between the profitability and risk during the optimisation of $Z$.
The corresponding code can be found in Appendix 4 - Trading and Backtesting.ipynb
In this section, the trading strategy is developed using the parameters outlined in section 2.3.7. On both training set and testing set, the performance of the approach is evaluated. Further considerations on the impact of bid-ask spread, transaction costs, and strategy resilience are provided as well.
P.S. The trade(s) in the context of backtest represent an 'entry and exit'(s) of the position.
Let's begin by examining the performance of the trading method. The parameters $Z_{e}$, $Z$, and $Z_{s}$ are not yet optimized and are respectively set to 0, 1 and 3.5. The bid-ask spread and transaction cost are equal to zero. The Acceptable Maximum Drawdown is 100%. The assumed risk-free interest rate is zero. Following is a list of the strategy's assumptions and parameter settings:
| Parameter/Assumption | Value |
|---|---|
| $Z_{e}$ | 0 |
| $Z$ | 1 |
| $Z_{s}$ | 3.5 |
| spread | 0 |
| longcost | 0 |
| shortcost | 0 |
| MADD | 1 |
| risk-free interest rate | 0 |
Based on the preceding assumptions and settings, the backtest table for the training set is as follows:
| Backtest | |
|---|---|
| Start Date | 2002-03-05 |
| End Date | 2017-10-05 |
| Total number of trades | 23.0 |
| Average trades per year | 1.54 |
| Average trading days(Calender) | 146.35 |
| Annual Return | 130.15% |
| Cumulative Return | 5124.29% |
| Average Return per trade | 117.6% |
| Annual volatility | 54.94% |
| Annual Alpha | 0.26 |
| Beta | 0.07 |
| Information Ratio | 0.52 |
| Sharpe Ratio | 0.55 |
| Max Drawdown | -60.46% |
| Daily Value at Risk(99%) | -7.94% |
| Drawdown Control | True |
There are 23 trades during the course of 15 years and 7 months of training, which equates to an average of 1.54 trades each year. The backtesting period begins on 2002-03-05 and ends on 2017-10-05. The average position holding time per trade is 146.35 calendar days or 104.54 trading days. It is longer than the half-life, which is reasonable given the previous discussion in 1.4.2.
The average annual return is 130.15%, while the average return per trade is 117.6%.
The average annual volatility is 54.94%, which is quite high. In the strategy with $Z_{s} = $3.5$, there is no stop-loss behaviour. With the optimization of $Z_{s}$, annual volatility is expected to fall.
The annual alpha is 0.26, a positive value. Thus, the strategy outperforms the market on average; the excess return is positive. The strategy's exposure to market risk is minimal, given its beta is 0.07. This is appropriate given that the strategy trades the abnormal residual that generates arbitrage opportunities. This chance for arbitrage is unrelated to market risk.
Information ratio and Sharpe ratio are respectively 0.52 and 0.55. The return per unit of risk is not particularly rewarding due to the strategy's significant volatility.
The maximum drawdown during the trading period is enormous at 60.46%. The average daily value at risk(99%) is 7.94%. This is for the same reason that there is no stop-loss in place for $Z_{s} = 3.5$. As the maximum acceptable drawdown (MADD) has been set to a very high level of 100%, the drawdown is still under control.
The corresponding backtest plots are as following:
The above backtesting charts demonstrate:
According to the plots of rolling metrics, the 1Y rolling curve is flatter and more stable than the 6M rolling curve. The observation is reasonable because the rolling window size is larger for 1Y rolling parameters, which results in a less sensitive results. The idea is the same as we've discussed in 2.3.4. The strategy is extremely risky in general, and even more so as the backtesting period progresses, beyond 2015.
In this section, parameters of $Z_{e}$, $Z$ and $Z_{s}$ are optimised in order to maximise the Sharpe ratio of the trading strategy based on the backtest result of the training dataset. The searching method used is a self-built Grid search algorithm with candidate values of the parameters listed as:
| Parameter | Candidates |
|---|---|
| $Z_{e}$ | 0.0, 0.03, 0.06, 0.09, 0.12, 0.15 |
| $Z$ | 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0, 1.05, 1.1, 1.15, 1.2, 1.25, 1.3, 1.35, 1.4, 1.45, 1.5, 1.55, 1.6 |
| $Z_{s}$ | 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1 |
The Grid seach gives the results of Maximum Sharpe Ratio parameters of $Z_{e}=0.03$, $Z=1.5$ and $Z_{s} = 2.8$. The new parameter settings and assumptions are listed below:
| Parameter/Assumption | Value |
|---|---|
| $Z_{e}$ | 0.03 |
| $Z$ | 1.5 |
| $Z_{s}$ | 2.8 |
| spread | 0 |
| longcost | 0 |
| shortcost | 0 |
| MADD | 0.7 |
| risk-free interest rate | 0 |
The results of the training set backtest with the revised settings are as follows:
| Backtest | |
|---|---|
| Start Date | 2002-03-05 |
| End Date | 2017-10-05 |
| Total number of trades | 16.0 |
| Average trades per year | 1.07 |
| Average trading days(Calender) | 136.69 |
| Annual Return | 133.86% |
| Cumulative Return | 7804.86% |
| Average Return per trade | 131.3% |
| Annual volatility | 39.33% |
| Annual Alpha | 0.28 |
| Beta | 0.08 |
| Information Ratio | 0.55 |
| Sharpe Ratio | 0.57 |
| Max Drawdown | -53.47% |
| Daily Value at Risk(99%) | -5.65% |
| Drawdown Control | True |
During the 15 years and 7 months of the training set, there are 16 trades, or an average of 1.07 trades each year. Due to the wider entry bounds ($Z$ increases from 1.0 to 1.5), there are fewer trades than previously, which makes it more difficult to activate the trading signals.
The average position holding time per trade is 136.69 calendar days or roughly 97.64 trading days. It becomes shorter than it was previously. The average annual return is 133.86%, while the average return per trade is 131.3%. Both the annual return and average per trade return have grown slightly.
The annual volatility is 39.33%, which is significantly lower than it was previously (54.94%) However, it remains rather high. $Z_{s}$ is set to $2.8, which prevents further loss in the trade if the residual deviates too far from equilibrium.
The annual alpha rises from 0.26 to 0.28. Beta is 0.08, which is a slight rise but still sufficiently small.
The information ratio and Sharpe ratio are correspondingly 0.55 and 0.57. In comparison to the non-optimized strategy, both have increased.
The maximum drawdown has become 53.47%, which is less than before (60.46%). Daily value at risk (99%) is now 5.65%, a decrease from 7.94% previously. The maximum acceptable drawdown (MADD) has been set to 70% and is under control.
After optimization, it is evident that the risk metrics (annual volatility, maximum drawdown, and daily VaR) all drop. The new stop-loss parameter setting $Z_{s}$ is effective. However, the strategy is still highly risky. If the trader has a greater aversion to risk, this strategy would not be appealing. The MADD is set at 70%; if it were set at 60%, the drawdown would be unmanageable!
The following are the related backtest charts on training data with optimal Sharpe ratio parameter setting:
During the whole four-year period from May 2007 to May 2011, according to the backtesting plots, there is no trade at all. As previously mentioned, the residual is rather stable and less erratic during the middle portion of the training set duration. With $Z=1.5$, the entry bounds are wide, therefore the middle-part residual cannot reach the entry bounds to generate trading signals. It is regarded as a drawback of this optimal Sharpe ratio strategy due to the 4-year period of no trades.
In general, the strategy is still fairly risky, but it has become less so compared to the method that lacked stop-loss control.
Following is a listing of each trade's profit and loss:
| Trade | P&L |
|---|---|
| 1 | 166.4 % |
| 2 | 123.37 % |
| 3 | 162.07 % |
| 4 | 140.61 % |
| 5 | 147.94 % |
| 6 | 140.26 % |
| 7 | 128.24 % |
| 8 | 132.66 % |
| 9 | 129.24 % |
| 10 | 108.74 % |
| 11 | 161.29 % |
| 12 | 136.1 % |
| 13 | 136.9 % |
| 14 | 116.1 % |
| 15 | 75.02 % |
| 16 | 128.47 % |
According to the table, the trade No.15 has negative return (less than 100%). This is due to the stop-loss event occurring in February 2017, which triggers the trade's stop-loss exit and resulted in a loss. The returns on trade No. 1, No. 3, and No. 11 are greater than 160%. The strategy with $Z=1.5$ is a 'few trades with large returns' strategy.
We apply the optimal Sharpe ratio parameter setting determined in section 3.2 to the backtest on the testing set. The testing dataset's backtest table is shown below:
| Backtest | |
|---|---|
| Start Date | 2017-10-09 |
| End Date | 2022-10-28 |
| Total number of trades | 8.0 |
| Average trades per year | 1.61 |
| Average trading days(Calender) | 111.0 |
| Annual Return | 161.62% |
| Cumulative Return | 1092.22% |
| Average Return per trade | 129.74% |
| Annual volatility | 60.25% |
| Annual Alpha | 0.49 |
| Beta | 0.05 |
| Information Ratio | 0.75 |
| Sharpe Ratio | 0.79 |
| Max Drawdown | -56.94% |
| Daily Value at Risk(99%) | -8.62% |
| Drawdown Control | False |
During the five-year testing period, there are 8 trades, or 1.61 trades per year on average. The average position holding time per trade is around 79.29 trading days or 111.0 calendar days. The average annual return is 161.62%, while the average return per trade is 129.74%. The average annual return is greater than that of the training set, although the average return of each trade is slightly lower.
The annual volatility is 60.25%, which is greater than the training set's backtested volatility.
The annual alpha is 0.49, which is more than the training set's backtested alpha (0.28). The beta is 0.05, which is less than the training set's beta (0.08).
The information ratio and Sharpe ratio are 0.75 and 0.79, respectively, which are both greater than the values on the training set.
The maximum drawdown is 56.94%, which exceeds the maximum drawdown of the training set (53.47%). Daily value at risk (99%) is 8.62%, which is also greater than the value from the training set (5.62%). The maximum acceptable drawdown (MADD) is still set to 70%, however the drawdown is out of control for the testing set. The drawdown will continue to be unmanageable until the MADD is set more than 75%. The same stop-loss strategy fails to control the risk on the testing set.
Overall, both return and risk are greater in the testing dataset. Traders must have a high tolerance to risk (MADD of 75%) to employ this method. The respective backtesting plots are depicted as follows:
According to the plots of the testing set, there is no trades over the full year between November 2010 and November 2011:
The following table details the profit and loss of each trade in the testing set:
| Trade | P&L |
|---|---|
| 1 | 136.74 % |
| 2 | 127.73 % |
| 3 | 137.67 % |
| 4 | 142.96 % |
| 5 | 132.1 % |
| 6 | 165.22 % |
| 7 | 77.7 % |
| 8 | 137.73 % |
According to the table, trade No.7 has a negative return (less than 100%). This is because the stop-loss event that occurred on this trade triggered the stop-loss exit, resulting in a loss. The Trade with the highest return is No.6 with 165.22%.
In order to compare the performance of the strategy with the market return, a backtest using market data from the same time period as the testing set (2017-10-09 to 2022-10-28) is shown below. The market return is represented by the return of the S&P500 ETF index, and the backtesting platform is pyfolio.
Listed below are the results of the market return backtesting:
The definition of 'Return' in pyfolio differs from this project's self-built backtest platform. After converting the pyfolio return to the self-constructed backtest return, the market and strategy performances for the testing period are:
| Market | Strategy | |
|---|---|---|
| Annual Return | 108.93% | 161.62% |
| Cumulative Return | 154.13% | 1092.22% |
| Annual Volatility | 21.452% | 60.25% |
| Sharpe Ratio | 0.39 | 0.79 |
| Max drawdown | -36.103% | -56.94% |
| Daily VaR | -2.669%(95%) | -8.62%(99%) |
According to the comparison table, the statistical strategy offers a better return than the market, but also a higher risk. The Sharpe ratio for the strategy is greater than that of the market, indicating that the strategy beats the market in terms of return-risk ratio.
This section demonstrates the analysis of the effects of bid-ask spread and transaction costs on the performance of the strategy. In addition, the effect of parameter $\mu_{e}$'s uncertainty on performance is discussed. Finally, a exploration of the relationship between the total number of trades and strategy's performance is presented.
In this part, we analyse the effect of the bid-ask spread on the performance of the trading method. $Z$, $Z_{e}$, and $Z_{s}$ are established as the optimal setting for the Sharpe ratio. During the trading period, the bid-ask spread is assumed to stay constant for both traded items. The bid-ask spread varies from 0 to 0.5 in increments of 0.01. Following are graphs of the strategy's Cumulative Return, Volatility, and Sharp Ratio for various bid-ask spreads:
According to the plots above:
In this part, we study the influence of the transaction cost of longing the tradable assets on the strategy's performance. $Z$, $Z_{e}$ and $Z_{s}$ are fixed to be the optimal Sharpe ratio settings. It is assumed that the transaction cost is a constant fraction of the trading amount for both tradable assets on each day of the trading period. The transaction cost of longing is between 0 and 0.005 at 0.0001 increments. The subsequent graphs depict the Cumulative Return, Volatility, and Sharp Ratio of the strategy for varying transaction costs:
According to the plots above:
In this part, we study the influence of the transaction cost of shorting the tradable assets on the strategy's performance. $Z$, $Z_{e}$ and $Z_{s}$ are fixed to be the optimal Sharpe ratio settings. The transaction cost is assumed to be a constant fraction of the trading amount for both tradable assets for all days during the trading period. The transaction cost of shorting is ranged from 0 to 0.005 at the step of 0.0001. Following are plots of Cumulative Return, Volatility and Sharpe Ratio of the strategy on different transaction cost of shorting:
According to the plots above:
In this part, we study the robustness of the parameter $\mu_{e}$ on the strategy's performance. Unlike the parameter $\sigma_{eq}$ which can be adjusted by $Z$, $Z_{e}$ and $Z_{s}$, there isn't any adjusting coefficient in front of $\mu_{e}$. So it is important to study how the uncertainty of $\mu_{e}$ influences the strategy's performance, which is also called the Robustness. A list of slight bias which range from -25% to 25% at the step of 2.5% are applied to $\mu_{e}$ in the way of:
$$ \mu^{new}_{e} = \mu_{e}(1+{bias})$$Following are graphs of Cumulative Return, Volatility and Sharpe Ratio of the strategy on various bias:
The above graphs demonstrate that the effect of the biassed $\mu_{e}$ on the performance of the strategy is not as straightforward as the effects of bid-ask spread and transaction costs:
The influence curve of the biassed $\mu_{e}$ on the performance is not continuos: the changes in cumulative returns, volatility, and Sharpe ratio occur in discrete increments. In most instances, changes in $\mu_{e}$ do not affect cumulative returns, volatility, or the Sharpe ratio. Nonetheless, fluctuations in $\mu_{e}$ may cause cumulative returns, volatility, and Sharpe ratio to increase sometimes. This should be due to the technique for generating trade signals:
The trading bounds $\mu_{e} \pm Z\sigma_{eq}$, $\mu_{e} \pm Z_{s}\sigma_{eq}$ and $\mu_{e} \pm Z_{e}\sigma_{eq}$ are influenced by the change of $\mu_{e}$. However, as the change of $mu_{e}$ are quite small comparing to the value of $Z\sigma_{eq}$ and $Z_{s}\sigma_{eq}$, in most cases the small changes of the bounds caused by $\mu_{e}$ do not influence the number of trades. Because the different bounds vary in a simultaneous fashion, their influences on P&L balance each other out, and the cumulative returns, volatility, and Sharpe ratio are unaffected. This is the reason why, in the majority of instances, changes in $\mu_{e}$ do not result in changes in cumulative returns, volatility, or Sharpe ratio.
Occasionally, however, the slight changes in the new bounds produced by the change in $\mu_{e}$ generate new trading signals or disable the original trading signals, resulting in the generation of new trades or the disappearance of some original trades. In the event that this occurs, the cumulative returns, volatility, and Sharpe ratio will have stepwise adjustments.
In this part, we study the relationship between total number of trades and the strategy's performance. With all the parameters except $Z$ fixed to be the optimal Sharpe ratio setting, $Z$ varies from 0.1 to 2.0 at the step of 0.1. The total numbers of trades are recorded together with the corresponding cumulative returns, volatilities and Sharpe ratios. Among all the cumulative returns with the same total number of trades, the maximum one selected to represent the cumulative return for that number of trades. The same methodology applies to volatility and Sharpe ratio metrices as well.
With $Z_{e}$ and $Z_{s}$ fixed, the total number of trade can be regarded as being proportional to $\frac{1}{Z}$. According to the graphs above:
All of the observations and conclusions in section 3.5 are heavily dependent on the cointegrated pair chosen for this project. It is not expected that these findings and conclusions would be consistent with the long/short trading strategies based on other pairs.
In this section, parameters of $Z_{e}$, $Z$ and $Z_{s}$ are optimised again to maximise the Sharpe ratio with bid-ask spread of 0.5 and transaction costs of 0.0002. The search is based on the backtest result of the training dataset. Similar to before, the search method and candidate values for $Z_{e}$, $Z$, and $Z_{s}$ remain the same.
The Grid search yields the same Maximum Sharpe Ratio parameter values as before: $Z_{e}=0.03$, $Z=1.5$, and $Z_{s}=2.8$. As stated in section 3.5, the effects of the bid-ask spread and transaction costs on the strategy are continuous, monotonous, and in the same direction. Thus, the optimal setting should remain constant. The following are the parameter settings and assumptions:
| Parameter/Assumption | Value |
|---|---|
| $Z_{e}$ | 0.03 |
| $Z$ | 1.5 |
| $Z_{s}$ | 2.8 |
| spread | 0.5 |
| longcost | 0.0002 |
| shortcost | 0.0002 |
| MADD | 0.7 |
| risk-free interest rate | 0 |
The corresponding backtesting table is:
| Backtest | |
|---|---|
| Start Date | 2002-03-05 |
| End Date | 2017-10-05 |
| Total number of trades | 16.0 |
| Average trades per year | 1.07 |
| Average trading days(Calender) | 136.69 |
| Annual Return | 131.02% |
| Cumulative Return | 5660.64% |
| Average Return per trade | 128.69% |
| Annual volatility | 40.38% |
| Annual Alpha | 0.26 |
| Beta | 0.09 |
| Information Ratio | 0.52 |
| Sharpe Ratio | 0.54 |
| Max Drawdown | -54.7% |
| Daily Value at Risk(99%) | -5.81% |
| Drawdown Control | False |
According to the table, we have:
The cumulative return falls from 7804.86% to 5660.64%, the annual return falls from 133.86% to 131.02%, and the annual alpha falls from 0.29 to 0.26.
Anuual volatility increases from 39.33% to 40.38%; max drawdown increases from 53.47% to 54.7%; daily VaR increases from 5.65% to 5.81%. Beta also increases a little bit from 0.08 to 0.09. And moreover, the drawdown is out of control with MADD=0.7 as before while considering the spread and transaction costs into account.
Sharpe ratio drops from 0.57 to 0.54; information ratio falls from 0.55 to 0.52.
The oytcome is foreseeable and consistent with the studies of 3.5.1, 3.5.2 and 3.5.3. Bid-ask spread and transaction costs increase the risk of the strategy while reducing its return.
The accompanying graphs represent the backtest with bid-ask spread and transaction costs on the training set. There isn't any notable difference to the plots of 3.2. The only obvious difference is found in the plot Bid, Ask and Trading Prices of Portfolio: by considering the bid-ask spread and transaction costs, bid price and ask price of the portfolio are distinct. Depending on the current position, trading price of the portfolio equals to either of them.
We apply the same optimal Sharpe ratio parameter settings obatained in 3.6 to the backtest on testing set. The testing dataset's backtest table is shown below:
| Backtest | |
|---|---|
| Start Date | 2017-10-09 |
| End Date | 2022-10-28 |
| Total number of trades | 8.0 |
| Average trades per year | 1.61 |
| Average trading days(Calender) | 111.0 |
| Annual Return | 156.44% |
| Cumulative Return | 928.86% |
| Average Return per trade | 126.27% |
| Annual volatility | 61.94% |
| Annual Alpha | 0.45 |
| Beta | 0.04 |
| Information Ratio | 0.7 |
| Sharpe Ratio | 0.74 |
| Max Drawdown | -58.22% |
| Daily Value at Risk(99%) | -8.89% |
| Drawdown Control | False |
Comparing to the result in 3.3:
Cumulative return drops from 1092.22% to 928.86%; annual return drops from 161.62% to 156.44%; annual alpha decreases from 0.49 to 0.45.
Anuual volatility increases from 60.25% to 61.94%; max drawdown increases from 56.94% to 58.22%; daily VaR increases from 8.62% to 8.89%. Beta also drops a little bit from 0.05 to 0.04. Drawdown is still out of control with MADD=0.7. However, with non-zero bid-ask spread and transaction costs, the minimum MADD that makes the drawdown in control rises from 0.75 to 0.77.
Sharpe ratio drops from 0.79 to 0.74; information ratio from 0.75 to 0.7.
The conclusion is the same as the one on the training set in 3.6: bid-ask spread and transaction costs increase the risk of the strategy while reducing its return. However, with the setting of bid-ask spread of 0.5 and transaction costs of 0.0002, the strategy still outperforms the market.
Same as before, the only obvious difference from the plots of 3.3 is found in the plot Bid, Ask and Trading Prices of Portfolio: by considering the bid-ask spread and transaction costs, bid price and ask price of the portfolio are different. Trading price of the portfolio is either of them according to the current position of the trader.
The obvious disadvantage of our strategy is the risk. The technique provides a good return, which significantly surpasses the market. However, the significant risk will discourage many traders who are risk-averse. The maximum drawdown of over 50% and the highest daily VaR(99%) of over 20% is pretty alarming. In this respect, it is a more suitable technique for speculators.
Applying the lower stop-loss bounds by establishing a smaller $Z_{s}$ should be of great assistance in addressing the risk issue. Nonetheless, it will have an effect on the strategy's profitability and Sharpe ratio, given the strategy now provides the highest Sharpe ratio. In practice, one must initially identify the level of risk he or she can handle, such as annual volatility, maximum drawdown, or VaR. Then, since the risk is fixed, the objective is to identify the parameter settings that maximize profit. This method can thus be adapted for traders with varying risk appetites.
Another disadvantage is the typical position holding time, which is approximately four to five calendar months. This is an exceptionally lengthy arbitrage trading method. In addition to the aforementioned risk, a combination of high risk and long position holding duration may result in heavy collateral/margin pressure and cash turnover issues.
Once the cointegration pair is found, it is difficult to overcome or mitigate this disadvantage because the average position holding duration is a characteristic of the cointegration pair. It could be advantageous to reduce the entry bounds and increase the exit bounds, which would require raising $Z_{e}$ and decreasing $Z$. But this impacts the strategy's profitability.
The possibility of no-trade periods is another disadvantage. With $Z=1.5$, the strategy depends on massive mismatches between the future prices of Feeder Cattle and Live Cattle to trigger entrance into the position and generate a substantial profit. However, arbitrage possibilities involving substantial mismatches do not emerge frequently, particularly during periods when the livestock future market is stable.
Consequently, there is a potential that traders will have to wait a considerable amount of time (perhaps years) for a trade. It affects the strategy's usability. However, there is a potential advantage: if there are constant transaction costs at each trading period, the strategy's low trading frequency will preserve the return.
By decreasing the entry boundaries with smaller $Z$, the no-trade problem can be mitigated. This raises the frequency with which trading signals are generated, but increases the risk level and affects the Sharpe ratio.
Liquidity has a significant impact on the effectiveness of the long/short trading strategy. As Live Cattle is the leading variable, its liquidity is substantially more significant.
When we wish to trade but no one is selling or buying, we must wait. It produces the same result as increasing $Z$ and $Z_{s}$ while decreasing $Z_{e}$. There will be fewer trades for higher entry bounds and lower exit bounds, greater volatility and max drawback for higher stop-loss bounds. Because the optimal Sharpe ratio settings have been modified, the Sharpe ratio will decline.
As a property of trading assets, it is difficult to remedy a lack of liquidity by just modifying the settings of boundaries.
Most drawbacks can be alleviated by tuning the parameters of bounds $Z$, $Z_{e}$ and $Z_{s}$ at the cost of decreasing the Sharpe ratio. All disadvantages, however, originate from the features of the cointegrating pair. Selecting the cointegration pair with better features during the procedure of pair selection is the fundamental answer for everyone.
As mentioned in 2.3.4 and 2.3.5, a potential improvement for the trading strategy is the application of rolling $\beta_{coint}$ or adaptive $\beta_{coint}$ with Kalman Filter for the weights of assets in the construction of the long/short portfolio.
If one wants to use the rolling $\beta_{coint}$, the rolling window size must be carefully chosen to ensure the existence of the cointegration relationship. According to the graphs of T-stats of Rolling ADF Test for 1Y, 2Y, 3Y, 5Y and 7Y and T-stats of Rolling (1-α) for 1Y, 2Y, 3Y, 5Y and 7Y in 2.3.4, the minimum acceptable rolling window size is five years, but seven years is preferable.
The application of both rolling $\beta_{coint}$ or adaptive $\beta_{coint}$ requires the adjustments of the loading of the strategy. It is supposed to increase the return and reduce the risk of the trading strategy, thus improve the usability and Sharpe ratio. It's highly recommened to use the adaptive $\beta_{coint}$ with Kalman filter, which generates the stationary residual more significant as shown in 2.3.5.
As a means of enhancing the trading strategy, liquidity and algorithmic flow factors can be incorporated. It is possible to construct a model of order flow that takes into account the behaviour of entering and accumulating positions as well as the effect of our transactions on the market order book.[3]
The related issue is the possible leverage for the stratey. While the maximum leverage is 1/Marigin, the more adequate solution is a maximally leveraged market-neutral gain or alpha-to-margin ratio:
$$AM=\frac{\alpha}{Margin}$$The self-built backtest platform can be improved by adding the 'constraints' inputs. Most traders have trading limits, such as margin restrictions, limited capacity to borrow, maximum acceptable max drawdown, and maximum acceptable VaR. These limits have a significant impact on the actual trading activity of traders. These limits could be added to the backtesting platform as additional inputs (similar to MADD, bid-ask spread, and transaction fees) to produce more realistic backtest results.
As mentioned in 4.1.1, the constraints can form the minimum requirements in the process of optimisation. Parameters $Z$, $Z_{e}$ and $Z_{s}$ are optimised without breaking the constraints or with satisfying the minimum requirements. The resulting trading strategy is more practical and applicable to traders.
The self-built backtest platform only has rolling $\beta$ on the factor of market return. The only build-in factor is the market return represented by the return of S&P500 ETF index. One may want to know the $\alpha$ and $\beta$ on other common factors such as Fama-Frech factors UMD, SMB, HML, etc. Taking this into consideration, there can be an input of factor(s) in the backtest platform. Users can input the data of the factor(s) of interest in DataFrame format and the backtest platform return the regressing results of the exposure to the input factor(s) $\beta$(s) and excess return $\alpha$, together with the corresponding rolling plots of $\beta$(s) and $\alpha$. This improves the usability of the backtest platform.
The aforementioned list of potential enhancements to this trading method is by no means exhaustive. Due to the author's limited knowledge and time, these improvements have not yet been implemented in this project. In general, there are two primary approaches for potential improvements:
Some discussions regarding the project's trading strategy and general statistical arbitrage are offered at the end of the report:
Does cumulative P&L of the strategy of the project behave as expected for a cointegration arbitrage trading strategy? Is P&L coming from a few or many trades? What is halflife? What is the maximum drawdown and the behaviour of volatility/VaR?
For the optimal Sharpe ratio method, the cumulative return acts as expected: for each trade (except in the case of a stop-loss event), we gain the spread between the two future prices that hits the entry bounds. The P&L is generated by a few of trades with high returns, which in most cases range from 125% to 160%. The halflife is approximately 63.3362 days. The maximum drawdown of the sample dataset is 56.94%. The annual volatility and daily VaR at 99% are, respectively, 60.25% and 8.62%. According to the rolling volatility and VaR plots, rolling volatility peaks occur simultaneously with VaR peaks. It is evident that their trends are identical Because they both measure the risk factor.
Is P&L coming from a few large trades or many small trades? Does all profit come from a particular period?
As stated above, profit and loss result from a few rather sizable trades. These trades are distributed equitably during periods of large spread/residual volatility of the cointegration pair's prices. However, there is a particular period in which there is no benefit because the spread is very consistent throughout that time, resulting in no trades taking place.
What impact bid-ask spread and transaction costs will make?
For any trading strategies in general, the bid-ask spread and transaction costs will 'eat' the profit and decrease the total P&L, especially for the strategy with high number of low-profit trades if there is fixed costs. In practice, the bid-ask spread and transaction costs must be carefully considered in order to evaluate the P&L and avoid unanticipated losses.
What impact your strategy will make to the market order book?
In cases of small trading quantity, there won't any influences or changes to the market order book. However, imagine that the trading volume is huge and do make differences to the order book, the effect will be:
If the residual $e_t \ll \mu_{e}$, it means the Live Cattle future is priced too low and/or the Feeder Cattle future is priced too high with respect to the long-term cointegration relationship: the trader will long the portfolio according to the strategy, which means longing the Live Cattle future and shorting the Feeder Cattle future. Due to the large trading quantities, the sell orders with the lowest prices are matched and the new lowest ask price of Live Cattle future in the order book becomes higher; the buy orders with the highest prices are matched and the new highest bid price of Feeder Cattle in the order book becomes lower. Thus, the ask price of Live Cattle future rises and the bid price of Feeder Cattle future drops; the strategy increases the price of Live Cattle and decreases the price of Feeder Cattle. According to the process, the residual is pulled up and results in reverting to the equilibrium.
If the residual $e_t \gg \mu_{e}$, we short Live Cattle futures and long Feeder Cattle futures (short the portfolio). The bid price of Live Cattle future decreases and the ask price of Feeder Cattle futures increases in the order book. Thus, the residual is pushed down and reverts to the equilibrium.
Altogether: if the prices of the two futures are mispriced in the order book in sense of that the cointegration relationship gets corrupted, the strategy will fix the mispricing problem by adjusting the prices which tries to make their residual back to the equilibrium again.
Wider bounds might give you the highest P&L and lowest number of trades. However, there is probability of the risk of cointegration breaking apart (the strategy is prone to the breakouts/partitioning of the coint relationship). How is it balanced?
The optimal Sharpe ratio strategy of our project can be regarded as an example of 'wide bounds' strategy with high P&L and low number of trades. $Z$ is set to be 1.5, which is quite high. The mechanism to stop the further loss caused by the cointegration breakouts is to set the stop-loss bounds using parameter $Z_{s}$. The stop-loss parameter is set to be 2.8. If the spread goes farther than $2.8\,\sigma_{eq}$ from the equilibrium, it is regarded as a cointegration breakout and we stop the trade.
Why rolling beta might not be as relevant to statistical arbitrage/alternative algorithmic and market-making strategies.
Our optimum Sharpe ratio long/short trading strategy has an average rolling beta of 0.05 and rarely exceeds 1. Because statistical arbitrage trades arbitrage opportunities, beta is irrelevant. The arbitrage opportunity in our strategy is the abnormally large spread between the co-movement of the two future prices. Since the occurrence of arbitrage opportunities and the spread of the arbitrage opportunities is independent of the market move and market risk, beta is irrelevant to statistical arbitrage. Using the same logic, what market-making methods trade has little to do with market risk.
Cointegrated relationship supposed to persist and $\beta_{coint}$ should stay the same: continue delivering the stationary spread over 3-6 months without the need to be updated. Is this realistic for your pair(s)?
According to 2.3.4, 2.3.5 and 2.3.6, this is realistic for the pair of our strategy. The halflife is 63.3362 trading days, which is approximately 3 calendar months, this indicates the slow changing speed of the residual. According to the plots of T-stats of Rolling ADF Test for 1Y, 2Y, 3Y, 5Y and 7Y and T-stats of Rolling (1-α) for 1Y, 2Y, 3Y, 5Y and 7Y, an obvious cointegrated relationship with relatively stable $\beta_{coint}$ is observed for a time length of at least 5 years. And once we get a $\beta_{coint}$, its value becomes reasonably steady and does not fluctuate significantly over time. Furthermore, the figure Adaptive β_coint based on Kalman Filter demonstrates the relatively stable result of the adaptive $\beta_{coint}$ with Kalman Filter as well.
These evidences show that for our selected pair the persistent time of $\beta_{coint}$ is quite long.
What are the benefits and disadvantages of regular re-estimation of cointegrated relationships. Report not only on rolling $\beta_{coint}$, but also the history of value of T statistic for the coefficient in front of EC term of Engle-Granger Step 2.
The benifits are discussed in 4.2.1.
Supposed that we use the rolling $\beta_{coint}$ of X years rolling window with re-estimation every Y weeks: as discussed above, the cointegration relationship can only be observed for a relatively long run of histroy data, so X could be big and has to be carefully selected (eg. X=5 for the pair in this project). Another possible disadvantage is: if there is a period with a different cointegration relationship with before, and then the relationship comes back to the original, this breakout period will influence a lot to the following rolling $\beta_{coint}$ because the rolling window size is relatively small. This might cause false $\beta{coint}$ for the current period.
Currently, I cannot think of any disadvantages to utilizing adaptive $\beta_{coint}$ based on the Kalman Filter. The only factor to consider is the Kalman Filter parameter settings. The selection of the setting relies heavily on the subjectivity of the developer, although it appears that the various Kalman Filter parameter values have little impact on our dataset.
However, in both cases, the trader has to come up with a mechanism of the accumulating position in trading. With the time-dependent non-constant $\beta_{coint}$, it becomes more complicated.
Discussions of the rolling $\beta_{coint}$ together with the T statistic for ADF test and the coefficient of the EC term in Engle-Granger Step 2 can be found in 2.3.4. Discussions of the adaptive $\beta_{coint}$ together with the T statistic for ADF test and the coefficient of the EC term in Engle-Granger Step 2 can be found in 2.3.5.
Sytematic Backtesting: what drives P&L, what you make money on?
According to systematic backtesting, the P&L does not result from exposure to market risk, but rather from arbitrage opportunities created by the mis-following of cointegration pair prices. The profit is generated by the excessive co-movement's spreads.
Backtesting: Rolling Sharpe ratio
From the rolling Sharpe ratio graphs, it is evident that the rolling curves are volatile and subject to shift. This is undesirable because it indicates that the expected return per unit of risk is unstable, hence decreasing the predictability of the trading strategy's P&L over a period of time.
Backtesting: Rolling market factor $\beta$
The rolling $\beta$ of the market return for the approach is quite low (less than 0.1). The impact of market risk on the overall return of the strategy is negligible most of the time. Nevertheless, rolling $\beta$ plots indicate that there are some periods with absolute rolling $\beta$ values greater than 1. It is unexpected for our strategy, because it indicates the sensitivity to the market risk during these periods and this is not what we would expect from a statistical arbitrage strategy.
[1]. Efficient Pair Selection for Pair-Trading Strategy - Advanced Financial Data Analysis, Patrick McSharry (2015).
[2]. CQF Lecture. Cointegration: Modelling Long-Term Relationships - Workings, CQF Faculty Dr. Richard Diamond.
[3]. Time Series for Pairs Trading, Final Workshop Ⅱ. Dr. Richard Diamond(2022).
[4]. Co-Integration and Error Correction: Representation, Estimation, and Testing Econometrica, Vol.55. Robert Engle and C.W.J Granger(1987)
[5]. Critical Values for Cointegration Tests, James G.MacKinnon(2010)
[6]. A Drunk and Her Dog: An Illustration of Cointegration and Error Correction, Michael P.Murray(1993)
[7]. Modelling Long Run Relationship in Time Series, CQF Lecture Module 6 Lecture 8. Dr. Richard Diamond
[8]. Coint Backtest DEMO, Dr. Richard Diamond
[9]. Fundamentals of Kalman Filtering: A Practical Approach, American Institute of Aeronautics and Astronautics. Howard Musoff & Paul Zarchan(2009)
[10]. Products and Convolutions of Gaussian Probability Density Functions, P.A. Bromiley(2013)